- 12 Apr, 2018 40 commits
-
-
Borislav Petkov authored
commit bb8c13d6 upstream. Emanuel reported an issue with a hang during microcode update because my dumb idea to use one atomic synchronization variable for both rendezvous - before and after update - was simply bollocks: microcode: microcode_reload_late: late_cpus: 4 microcode: __reload_late: cpu 2 entered microcode: __reload_late: cpu 1 entered microcode: __reload_late: cpu 3 entered microcode: __reload_late: cpu 0 entered microcode: __reload_late: cpu 1 left microcode: Timeout while waiting for CPUs rendezvous, remaining: 1 CPU1 above would finish, leave and the others will still spin waiting for it to join. So do two synchronization atomics instead, which makes the code a lot more straightforward. Also, since the update is serialized and it also takes quite some time per microcode engine, increase the exit timeout by the number of CPUs on the system. That's ok because the moment all CPUs are done, that timeout will be cut short. Furthermore, panic when some of the CPUs timeout when returning from a microcode update: we can't allow a system with not all cores updated. Also, as an optimization, do not do the exit sync if microcode wasn't updated. Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20180314183615.17629-2-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
commit 2613f36e upstream. Return UCODE_NEW from the scanning functions to denote that new microcode was found and only then attempt the expensive synchronization dance. Reported-by: Emanuel Czirai <xftroxgpx@protonmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Emanuel Czirai <xftroxgpx@protonmail.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20180314183615.17629-1-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ashok Raj authored
commit a5321aec upstream. Original idea by Ashok, completely rewritten by Borislav. Before you read any further: the early loading method is still the preferred one and you should always do that. The following patch is improving the late loading mechanism for long running jobs and cloud use cases. Gather all cores and serialize the microcode update on them by doing it one-by-one to make the late update process as reliable as possible and avoid potential issues caused by the microcode update. [ Borislav: Rewrite completely. ] Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-8-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
commit cfb52a5a upstream. ... so that any newer version can land in the cache and can later be fished out by the application functions. Do that before grabbing the hotplug lock. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-7-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
commit d8c3b52c upstream. The cache might contain a newer patch - look in there first. A follow-on change will make sure newest patches are loaded into the cache of microcode patches. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-6-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ashok Raj authored
commit 30ec26da upstream. Avoid loading microcode if any of the CPUs are offline, and issue a warning. Having different microcode revisions on the system at any time is outright dangerous. [ Borislav: Massage changelog. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-4-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-5-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ashok Raj authored
commit 91df9fdf upstream. Updating microcode is less error prone when caches have been flushed and depending on what exactly the microcode is updating. For example, some of the issues around certain Broadwell parts can be addressed by doing a full cache flush. [ Borislav: Massage it and use native_wbinvd() in both cases. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-3-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-4-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ashok Raj authored
commit c182d2b7 upstream. After updating microcode on one of the threads of a core, the other thread sibling automatically gets the update since the microcode resources on a hyperthreaded core are shared between the two threads. Check the microcode revision on the CPU before performing a microcode update and thus save us the WRMSR 0x79 because it is a particularly expensive operation. [ Borislav: Massage changelog and coding style. ] Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: http://lkml.kernel.org/r/1519352533-15992-2-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-3-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
commit 854857f5 upstream. It is a useless remnant from earlier times. Use the ucode_state enum directly. No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Link: https://lkml.kernel.org/r/20180228102846.13447-2-bp@alien8.deSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
commit 42ca8082 upstream. With some microcode upgrades, new CPUID features can become visible on the CPU. Check what the kernel has mirrored now and issue a warning hinting at possible things the user/admin can do to make use of the newly visible features. Originally-by: Ashok Raj <ashok.raj@intel.com> Tested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180216112640.11554-4-bp@alien8.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
commit 1008c52c upstream. Add a callback function which the microcode loader calls when microcode has been updated to a newer revision. Do the callback only when no error was encountered during loading. Tested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180216112640.11554-3-bp@alien8.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Borislav Petkov authored
commit 3f1f576a upstream. ... so that callers can know when microcode was updated and act accordingly. Tested-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180216112640.11554-2-bp@alien8.deSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ard Biesheuvel authored
commit 019cd469 upstream. Most crypto drivers involving kernel mode NEON take care to put the code that actually touches the NEON register file in a separate compilation unit, to prevent the compiler from reordering code that preserves or restores the NEON context with code that may corrupt it. This is necessary because we currently have no way to express the restrictions imposed upon use of the NEON in kernel mode in a way that the compiler understands. However, in the case of aes-ce-cipher, it did not seem unreasonable to deviate from this rule, given how it does not seem possible for the compiler to reorder cross object function calls with asm blocks whose in- and output constraints reflect that it reads from and writes to memory. Now that LTO is being proposed for the arm64 kernel, it is time to revisit this. The link time optimization may replace the function calls to kernel_neon_begin() and kernel_neon_end() with instantiations of the IR that make up its implementation, allowing further reordering with the asm block. So let's clean this up, and move the asm() blocks into a separate .S file. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-By: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Matthias Kaehlcke <mka@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit 3c1f0583 upstream. Since the ORC unwinder was made the default on x86_64, Clang-built defconfig kernels have triggered some new objtool warnings: drivers/gpu/drm/i915/i915_gpu_error.o: warning: objtool: i915_error_printf()+0x6c: return with modified stack frame drivers/gpu/drm/i915/intel_display.o: warning: objtool: pipe_config_err()+0xa6: return with modified stack frame The problem is that objtool has never seen clang-built binaries before. Shockingly enough, objtool is apparently able to follow the code flow mostly fine, except for one instruction sequence. Instead of a LEAVE instruction, clang restores RSP and RBP the long way: 67c: 48 89 ec mov %rbp,%rsp 67f: 5d pop %rbp Teach objtool about this new code sequence. Reported-and-test-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/fce88ce81c356eedcae7f00ed349cfaddb3363cc.1521741586.git.jpoimboe@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Khoroshilov authored
[ Upstream commit 0be86969 ] There are resources that are not dealocated on failure path in int3400_thermal_probe(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Christie authored
[ Upstream commit 810b8153 ] If we cannot setup a cmd because we run out of ring space or global pages release the blocks before sleeping. This prevents a deadlock where dev0 has waiting_blocks set and needs N blocks, but dev1 to devX have each allocated N / X blocks and also hit the global block limit so they went to sleep. find_free_blocks is not able to take the sleeping dev's blocks becaause their waiting_blocks is set and even if it was not the block returned by find_last_bit could equal dbi_max. The latter will probably never happen because DATA_BLOCK_BITS is so high but in the next patches DATA_BLOCK_BITS and TCMU_GLOBAL_MAX_BLOCKS will be settable so it might be lower and could happen. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Olsa authored
[ Upstream commit fa1195cc ] We need to increase output offset in each iteration, not decrease it as we currently do. I guess we were lucky to finish in most cases in first iteration, so the bug never showed. However it shows a lot when working with big (~4GB) size data. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 9c9f5a2f ("perf tools: Introduce copyfile_offset() function") Link: http://lkml.kernel.org/r/20180109133923.25406-1-jolsa@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arnd Bergmann authored
[ Upstream commit 148b974d ] While testing other changes, I discovered that gcc-7.2.1 produces badly optimized code for aes_encrypt/aes_decrypt. This is especially true when CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely large stack usage that in turn might cause kernel stack overflows: crypto/aes_generic.c: In function 'aes_encrypt': crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=] crypto/aes_generic.c: In function 'aes_decrypt': crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=] I verified that this problem exists on all architectures that are supported by gcc-7.2, though arm64 in particular is less affected than the others. I also found that gcc-7.1 and gcc-8 do not show the extreme stack usage but still produce worse code than earlier versions for this file, apparently because of optimization passes that generally provide a substantial improvement in object code quality but understandably fail to find any shortcuts in the AES algorithm. Possible workarounds include a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier patch I tried, which reliably fixed the stack usage, but caused a serious performance regression in some versions, as later testing found. b) disabling UBSAN on this file or all ciphers, as suggested by Ard Biesheuvel. This would lead to massively better crypto performance in UBSAN-enabled kernels and avoid the stack usage, but there is a concern over whether we should exclude arbitrary files from UBSAN at all. c) Forcing the optimization level in a different way. Similar to a), but rather than deselecting specific optimization stages, this now uses "gcc -Os" for this file, regardless of the CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable workaround for the stack consumption on all architecture, and I've retested the performance results now on x86, cycles/byte (lower is better) for cbc(aes-generic) with 256 bit keys: -O2 -Os gcc-6.3.1 14.9 15.1 gcc-7.0.1 14.7 15.3 gcc-7.1.1 15.3 14.7 gcc-7.2.1 16.8 15.9 gcc-8.0.0 15.5 15.6 This implements the option c) by enabling forcing -Os on all compiler versions starting with gcc-7.1. As a workaround for PR83356, it would only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows better performance on gcc-7.1 without UBSAN, it seems appropriate to use the faster version here as well. Side note: during testing, I also played with the AES code in libressl, which had a similar performance regression from gcc-6 to gcc-7.2, but was three times slower overall. It might be interesting to investigate that further and possibly port the Linux implementation into that. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 Cc: Richard Biener <rguenther@suse.de> Cc: Jakub Jelinek <jakub@gcc.gnu.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miquel Raynal authored
[ Upstream commit 12663b44 ] Reads from NAND devices usually trigger bitflips, this is an expected behavior. While bitflips are under a given threshold, the MTD core returns 0. However, when the number of corrected bitflips is above this same threshold, -EUCLEAN is returned to inform the upper layer that this block is slightly dying and soon the ECC engine will be overtaken so actions should be taken to move the data out of it. This particular condition should not be treated like an error and the test should continue. Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans de Goede authored
[ Upstream commit faec44b6 ] We should not try to do any i2c transfers before the controller is resumed (which happens before our resume method gets called). So we need to disable our IRQ while suspended to enforce this. The code paths for devices with GPIOs for the int and reset pins already disable the IRQ the through goodix_free_irq(). This commit also disables the IRQ while suspended for devices without GPIOs for the int and reset pins. This fixes the i2c bus sometimes getting stuck after a suspend/resume causing the touchscreen to sometimes not work after a suspend/resume. This has been tested on a GPD pocked device. BugLink: https://github.com/nexus511/gpd-ubuntu-packages/issues/10 BugLink: https://www.reddit.com/r/GPDPocket/comments/7niut2/fix_for_broken_touch_after_resume_all_linux/Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Bastien Nocera <hadess@hadess.net> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nathan Fontenot authored
[ Upstream commit 09fb35ea ] Initiating a kdump via the command line can cause a pending interrupt to be handled by the ibmvnic driver when initializing the sub-CRQ irqs during driver initialization. NIP [d000000000ca34f0] ibmvnic_interrupt_rx+0x40/0xd0 [ibmvnic] LR [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0 Call Trace: [c000000047fcfde0] [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0 [c000000047fcfea0] [c00000000813317c] handle_irq_event_percpu+0x3c/0x90 [c000000047fcfee0] [c00000000813323c] handle_irq_event+0x6c/0xd0 [c000000047fcff10] [c0000000081385e0] handle_fasteoi_irq+0xf0/0x250 [c000000047fcff40] [c0000000081320a0] generic_handle_irq+0x50/0x80 [c000000047fcff60] [c000000008014984] __do_irq+0x84/0x1d0 [c000000047fcff90] [c000000008027564] call_do_irq+0x14/0x24 [c00000003c92af00] [c000000008014b70] do_IRQ+0xa0/0x120 [c00000003c92af50] [c000000008002594] hardware_interrupt_common+0x114/0x180 Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Shevchenko authored
[ Upstream commit 2a609abe ] On Intel Edison the Broadcom Wi-Fi card, which is connected to SDIO, requires 2.0v, while the host, according to Intel Merrifield TRM, supports 1.8v supply only. The card announces itself as mmc2: new ultra high speed DDR50 SDIO card at address 0001 Introduce a custom OCR mask for SDIO host controller on Intel Merrifield and add a special case to sdhci_set_power_noreg() to override 2.0v supply by enforcing 1.8v power choice. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Bohac authored
[ Upstream commit 2a3e83c6 ] On machines where the GART aperture is mapped over physical RAM /proc/vmcore contains the remapped range and reading it may cause hangs or reboots. In the past, the GART region was added into the resource map, implemented by commit 56dd669a ("[PATCH] Insert GART region into resource map") However, inserting the iomem_resource from the early GART code caused resource conflicts with some AGP drivers (bko#72201), which got avoided by reverting the patch in commit 707d4eef ("Revert [PATCH] Insert GART region into resource map"). This revert introduced the /proc/vmcore bug. The vmcore ELF header is either prepared by the kernel (when using the kexec_file_load syscall) or by the kexec userspace (when using the kexec_load syscall). Since we no longer have the GART iomem resource, the userspace kexec has no way of knowing which region to exclude from the ELF header. Changes from v1 of this patch: Instead of excluding the aperture from the ELF header, this patch makes /proc/vmcore return zeroes in the second kernel when attempting to read the aperture region. This is done by reusing the gart_oldmem_pfn_is_ram infrastructure originally intended to exclude XEN balooned memory. This works for both, the kexec_file_load and kexec_load syscalls. [Note that the GART region is the same in the first and second kernels: regardless whether the first kernel fixed up the northbridge/bios setting and mapped the aperture over physical memory, the second kernel finds the northbridge properly configured by the first kernel and the aperture never overlaps with e820 memory because the second kernel has a fake e820 map created from the crashkernel memory regions. Thus, the second kernel keeps the aperture address/size as configured by the first kernel.] register_oldmem_pfn_is_ram can only register one callback and returns an error if the callback has been registered already. Since XEN used to be the only user of this function, it never checks the return value. Now that we have more than one user, I added a WARN_ON just in case agp, XEN, or any other future user of register_oldmem_pfn_is_ram were to step on each other's toes. Fixes: 707d4eef ("Revert [PATCH] Insert GART region into resource map") Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Baoquan He <bhe@redhat.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: David Airlie <airlied@linux.ie> Cc: yinghai@kernel.org Cc: joro@8bytes.org Cc: kexec@lists.infradead.org Cc: Borislav Petkov <bp@alien8.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/20180106010013.73suskgxm7lox7g6@dwarf.suse.czSigned-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wei Yongjun authored
[ Upstream commit 76e28f5f ] Fix to return error code -ENOMEM from the error handling case instead of 0, as done elsewhere in this function. Fixes: 5a2a3002 ("gpio: Add gpio driver support for ThunderX and OCTEON-TX") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Parav Pandit authored
[ Upstream commit 89838118 ] The 'if' logic in ucma_query_path was broken with OPA was introduced and started to treat RoCE paths as as OPA paths. Invert the logic of the 'if' so only OPA paths are treated as OPA paths. Otherwise the path records returned to rdma_cma users are mangled when in RoCE mode. Fixes: 57520751 ("IB/SA: Add OPA path record type") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shivasharan S authored
[ Upstream commit f3f7920b ] Issue - Driver returns DID_NO_CONNECT when unload is in progress, indicated using instance->unload flag. In case of dynamic unload of driver, this flag is set before calling scsi_remove_host(). While doing manual driver unload, user will see lots of prints for Sync Cache command with DID_NO_CONNECT status. Fix - Set the instance->unload flag after scsi_remove_host(). Allow device removal process to be completed and do not block any command before that. SCSI commands (like SYNC_CACHE) are received (as part of scsi_remove_host) by driver during unload will be submitted further down to the drives. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shivasharan S authored
[ Upstream commit 7ada701d ] Currently driver does not validate ldcount provided by firmware. If the value is invalid, fail RAID map validation accordingly. This issue is rare to hit in field and is fixed as part of code review. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arjun Vynipadath authored
[ Upstream commit ea0a4210 ] We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and the extant logic would flag that as an error. This was already fixed in cxgb4 driver with "92ddcc7b cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB". Original Work by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jacob Keller authored
[ Upstream commit 44b034b4 ] In i40evf_reset_task we use netif_running() to determine whether or not the device is currently up. This allows us to properly free queue memory and shut down things before we request the hardware reset. It turns out that we cannot be guaranteed of netif_running() returning false until the device is fully up, as the kernel core code sets __LINK_STATE_START prior to calling .ndo_open. Since we're not holding the rtnl_lock(), it's possible that the driver's i40evf_open handler function is currently being called while we're resetting. We can't simply hold the rtnl_lock() while checking netif_running() as this could cause a deadlock with the i40evf_open() function. Additionally, we can't avoid the deadlock by holding the rtnl_lock() over the whole reset path, as this essentially serializes all resets, and can cause massive delays if we have multiple VFs on a system. Instead, lets just check our own internal state __I40EVF_RUNNING state field. This allows us to ensure that the state is correct and is only set after we've finished bringing the device up. Without this change we might free data structures about device queues and other memory before they've been fully allocated. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Upstream commit 06028d15 ] In order for userspace application to signal host, it needs the host to support the monitor page property. Check for the flag and fail if this is not supported. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christophe JAILLET authored
[ Upstream commit 68fa24f9 ] We should not call edac_mc_del_mc() if a corresponding call to edac_mc_add_mc() has not been performed yet. So here, we should go to err instead of err2 to branch at the right place of the error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180107205400.14068-1-christophe.jaillet@wanadoo.frSigned-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Valente authored
[ Upstream commit 52257ffb ] For each pair [device for which bfq is selected as I/O scheduler, group in blkio/io], bfq maintains a corresponding bfq group. Each such bfq group contains a set of async queues, with each async queue created on demand, i.e., when some I/O request arrives for it. On creation, an async queue gets an extra reference, to make sure that the queue is not freed as long as its bfq group exists. Accordingly, to allow the queue to be freed after the group exited, this extra reference must released on group exit. The above holds also for a bfq root group, i.e., for the bfq group corresponding to the root blkio/io root for a given device. Yet, by mistake, the references to the existing async queues of a root group are not released when the latter exits. This causes a memory leak when the instance of bfq for a given device exits. In a similar vein, bfqg_stats_xfer_dead is not executed for a root group. This commit fixes bfq_pd_offline so that the latter executes the above missing operations for a root group too. Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reported-by: Guoqing Jiang <gqjiang@suse.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Davide Ferrari <davideferrari8@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tony Lindgren authored
[ Upstream commit ea3d8465 ] Some devices have the control dlci stay in ADM mode instead of the UA mode. This can seen at least on droid 4 when trying to open the ts 27.010 mux port. Enabling n_gsm debug mode shows the control dlci always respond with DM to SABM instead of UA: # modprobe n_gsm debug=0xff # ldattach -d GSM0710 /dev/ttyS0 & gsmld_output: 00000000: f9 03 3f 01 1c f9 --> 0) C: SABM(P) gsmld_receive: 00000000: f9 03 1f 01 36 f9 <-- 0) C: DM(P) ... $ minicom -D /dev/gsmtty1 minicom: cannot open /dev/gsmtty1: No error information $ strace minicom -D /dev/gsmtty1 ... open("/dev/gsmtty1", O_RDWR|O_NOCTTY|O_NONBLOCK|O_LARGEFILE) = -1 EL2HLT Note that this is different issue from other n_gsm -EL2HLT issues such as timeouts when the control dlci does not respond at all. The ADM mode seems to be a quite common according to "RF Wireless World" article "GSM Issue-UE sends SABM and gets a DM response instead of UA response": This issue is most commonly observed in GSM networks where in UE sends SABM and expects network to send UA response but it ends up receiving DM response from the network. SABM stands for Set asynchronous balanced mode, UA stands for Unnumbered Acknowledge and DA stands for Disconnected Mode. An RLP entity can be in one of two modes: - Asynchronous Balanced Mode (ABM) - Asynchronous Disconnected Mode (ADM) Currently Linux kernel closes the control dlci after several retries in gsm_dlci_t1() on DM. This causes n_gsm /dev/gsmtty ports to produce error code -EL2HLT when trying to open them as the closing of control dlci has already set gsm->dead. Let's fix the issue by allowing control dlci stay in ADM mode after the retries so the /dev/gsmtty ports can be opened and used. It seems that it might take several attempts to get any response from the control dlci, so it's best to allow ADM mode only after the SABM retries are done. Note that for droid 4 additional patches are needed to mux the ttyS0 pins and to toggle RTS gpio_149 to wake up the mdm6600 modem are also needed to use n_gsm. And the mdm6600 modem needs to be powered on. Cc: linux-serial@vger.kernel.org Cc: Alan Cox <alan@llwyncelyn.cymru> Cc: Jiri Prchal <jiri.prchal@aksignal.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Marcel Partap <mpartap@gmx.net> Cc: Michael Scott <michael.scott@linaro.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Russ Gorby <russ.gorby@intel.com> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Sebastian Reichel <sre@kernel.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ming Lei authored
[ Upstream commit 8ab0b7dc ] HW queues may be unmapped in some cases, such as blk_mq_update_nr_hw_queues(), then we need to check it before calling blk_mq_tag_idle(), otherwise the following kernel oops can be triggered, so fix it by checking if the hw queue is unmapped since it doesn't make sense to idle the tags any more after hw queues are unmapped. [ 440.771298] Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma] [ 440.779104] task: ffff894bae755ee0 ti: ffff893bf9bc8000 task.ti: ffff893bf9bc8000 [ 440.788359] RIP: 0010:[<ffffffffb730e2b4>] [<ffffffffb730e2b4>] __blk_mq_tag_idle+0x24/0x40 [ 440.798697] RSP: 0018:ffff893bf9bcbd10 EFLAGS: 00010286 [ 440.805538] RAX: 0000000000000000 RBX: ffff895bb131dc00 RCX: 000000000000011f [ 440.814426] RDX: 00000000ffffffff RSI: 0000000000000120 RDI: ffff895bb131dc00 [ 440.823301] RBP: ffff893bf9bcbd10 R08: 000000000001b860 R09: 4a51d361c00c0000 [ 440.832193] R10: b5907f32b4cc7003 R11: ffffd6cabfb57000 R12: ffff894bafd1e008 [ 440.841091] R13: 0000000000000001 R14: ffff895baf770000 R15: 0000000000000080 [ 440.849988] FS: 0000000000000000(0000) GS:ffff894bbdcc0000(0000) knlGS:0000000000000000 [ 440.859955] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 440.867274] CR2: 0000000000000008 CR3: 000000103d098000 CR4: 00000000001407e0 [ 440.876169] Call Trace: [ 440.879818] [<ffffffffb7309d68>] blk_mq_exit_hctx+0xd8/0xe0 [ 440.887051] [<ffffffffb730dc40>] blk_mq_free_queue+0xf0/0x160 [ 440.894465] [<ffffffffb72ff679>] blk_cleanup_queue+0xd9/0x150 [ 440.901881] [<ffffffffc08a802b>] nvme_ns_remove+0x5b/0xb0 [nvme_core] [ 440.910068] [<ffffffffc08a811b>] nvme_remove_namespaces+0x3b/0x60 [nvme_core] [ 440.919026] [<ffffffffc08b817b>] __nvme_rdma_remove_ctrl+0x2b/0xb0 [nvme_rdma] [ 440.928079] [<ffffffffc08b8237>] nvme_rdma_del_ctrl_work+0x17/0x20 [nvme_rdma] [ 440.937126] [<ffffffffb70ab58a>] process_one_work+0x17a/0x440 [ 440.944517] [<ffffffffb70ac3a8>] worker_thread+0x278/0x3c0 [ 440.951607] [<ffffffffb70ac130>] ? manage_workers.isra.24+0x2a0/0x2a0 [ 440.959760] [<ffffffffb70b352f>] kthread+0xcf/0xe0 [ 440.966055] [<ffffffffb70b3460>] ? insert_kthread_work+0x40/0x40 [ 440.973715] [<ffffffffb76d8658>] ret_from_fork+0x58/0x90 [ 440.980586] [<ffffffffb70b3460>] ? insert_kthread_work+0x40/0x40 [ 440.988229] Code: 5b 41 5c 5d c3 66 90 0f 1f 44 00 00 48 8b 87 20 01 00 00 f0 0f ba 77 40 01 19 d2 85 d2 75 08 c3 0f 1f 80 00 00 00 00 55 48 89 e5 <f0> ff 48 08 48 8d 78 10 e8 7f 0f 05 00 5d c3 0f 1f 00 66 2e 0f [ 441.011620] RIP [<ffffffffb730e2b4>] __blk_mq_tag_idle+0x24/0x40 [ 441.019301] RSP <ffff893bf9bcbd10> [ 441.024052] CR2: 0000000000000008 Reported-by: Zhang Yi <yizhan@redhat.com> Tested-by: Zhang Yi <yizhan@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
chenxiang authored
[ Upstream commit affc6778 ] The status of SAS PHY is in sas_phy->enabled. There is an issue that the status of a remote SAS PHY may be initialized incorrectly: if disable remote SAS PHY through sysfs interface (such as echo 0 > /sys/class/sas_phy/phy-1:0:0/enable), then reboot the system, and we will find the status of remote SAS PHY which is disabled before is 1 (cat /sys/class/sas_phy/phy-1:0:0/enable). But actually the status of remote SAS PHY is disabled and the device attached is not found. In SAS protocol, NEGOTIATED LOGICAL LINK RATE field of DISCOVER response is 0x1 when remote SAS PHY is disabled. So initialize sas_phy->enabled according to the value of NEGOTIATED LOGICAL LINK RATE field. Signed-off-by: chenxiang <chenxiang66@hisilicon.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Yan authored
[ Upstream commit 2b23d950 ] The intend purpose here was to goto out if smp_execute_task() returned error. Obviously something got screwed up. We will never get these link error statistics below: ~:/sys/class/sas_phy/phy-1:0:12 # cat invalid_dword_count 0 ~:/sys/class/sas_phy/phy-1:0:12 # cat running_disparity_error_count 0 ~:/sys/class/sas_phy/phy-1:0:12 # cat loss_of_dword_sync_count 0 ~:/sys/class/sas_phy/phy-1:0:12 # cat phy_reset_problem_count 0 Obviously we should goto error handler if smp_execute_task() returns non-zero. Fixes: 2908d778 ("[SCSI] aic94xx: new driver") Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: John Garry <john.garry@huawei.com> CC: chenqilin <chenqilin2@huawei.com> CC: chenxiang <chenxiang66@hisilicon.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Yan authored
[ Upstream commit 4a491b1a ] We've got a memory leak with the following producer: while true; do cat /sys/class/sas_phy/phy-1:0:12/invalid_dword_count >/dev/null; done The buffer req is allocated and not freed after we return. Fix it. Fixes: 2908d778 ("[SCSI] aic94xx: new driver") Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: John Garry <john.garry@huawei.com> CC: chenqilin <chenqilin2@huawei.com> CC: chenxiang <chenxiang66@hisilicon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tang Junhui authored
[ Upstream commit 4eca1cb2 ] In such scenario that there are some flash only volumes , and some cached devices, when many tasks request these devices in writeback mode, the write IOs may fall to the same bucket as bellow: | cached data | flash data | cached data | cached data| flash data| then after writeback of these cached devices, the bucket would be like bellow bucket: | free | flash data | free | free | flash data | So, there are many free space in this bucket, but since data of flash only volumes still exists, so this bucket cannot be reclaimable, which would cause waste of bucket space. In this patch, we segregate flash only volume write streams from cached devices, so data from flash only volumes and cached devices can store in different buckets. Compare to v1 patch, this patch do not add a additionally open bucket list, and it is try best to segregate flash only volume write streams from cached devices, sectors of flash only volumes may still be mixed with dirty sectors of cached device, but the number is very small. [mlyle: fixed commit log formatting, permissions, line endings] Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tang Junhui authored
[ Upstream commit 8d29c442 ] Currently, when a cached device detaching from cache, writeback thread is not stopped, and writeback_rate_update work is not canceled. For example, after the following command: echo 1 >/sys/block/sdb/bcache/detach you can still see the writeback thread. Then you attach the device to the cache again, bcache will create another writeback thread, for example, after below command: echo ba0fb5cd-658a-4533-9806-6ce166d883b9 > /sys/block/sdb/bcache/attach then you will see 2 writeback threads. This patch stops writeback thread and cancels writeback_rate_update work when cached device detaching from cache. Compare with patch v1, this v2 patch moves code down into the register lock for safety in case of any future changes as Coly and Mike suggested. [edit by mlyle: commit log spelling/formatting] Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rui Hua authored
[ Upstream commit b221fc13 ] The read request might meet error when searching the btree, but the error was not handled in cache_lookup(), and this kind of metadata failure will not go into cached_dev_read_error(), finally, the upper layer will receive bi_status=0. In this patch we judge the metadata error by the return value of bch_btree_map_keys(), there are two potential paths give rise to the error: 1. Because the btree is not totally cached in memery, we maybe get error when read btree node from cache device (see bch_btree_node_get()), the likely errno is -EIO, -ENOMEM 2. When read miss happens, bch_btree_insert_check_key() will be called to insert a "replace_key" to btree(see cached_dev_cache_miss(), just for doing preparatory work before insert the missed data to cache device), a failure can also happen in this situation, the likely errno is -ENOMEM bch_btree_map_keys() will return MAP_DONE in normal scenario, but we will get either -EIO or -ENOMEM in above two cases. if this happened, we should NOT recover data from backing device (when cache device is dirty) because we don't know whether bkeys the read request covered are all clean. And after that happened, s->iop.status is still its initially value(0) before we submit s->bio.bio, we set it to BLK_STS_IOERR, so it can go into cached_dev_read_error(), and finally it can be passed to upper layer, or recovered by reread from backing device. [edit by mlyle: patch formatting, word-wrap, comment spelling, commit log format] Signed-off-by: Hua Rui <huarui.dev@gmail.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-