- 31 Jan, 2019 29 commits
-
-
Dexuan Cui authored
commit ba50bf1c upstream. fc96df16 is good and can already fix the "return stack garbage" issue, but let's also improve hv_ringbuffer_get_debuginfo(), which would silently return stack garbage, if people forget to check channel->state or ring_info->ring_buffer, when using the function in the future. Having an error check in the function would eliminate the potential risk. Add a Fixes tag to indicate the patch depdendency. Fixes: fc96df16 ("Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels") Cc: stable@vger.kernel.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
commit da8ced36 upstream. Hyper-V memory hotplug protocol has 2M granularity and in Linux x86 we use 128M. To deal with it we implement partial section onlining by registering custom page onlining callback (hv_online_page()). Later, when more memory arrives we try to online the 'tail' (see hv_bring_pgs_online()). It was found that in some cases this 'tail' onlining causes issues: BUG: Bad page state in process kworker/0:2 pfn:109e3a page:ffffe08344278e80 count:0 mapcount:1 mapping:0000000000000000 index:0x0 flags: 0xfffff80000000() raw: 000fffff80000000 dead000000000100 dead000000000200 0000000000000000 raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 page dumped because: nonzero mapcount ... Workqueue: events hot_add_req [hv_balloon] Call Trace: dump_stack+0x5c/0x80 bad_page.cold.112+0x7f/0xb2 free_pcppages_bulk+0x4b8/0x690 free_unref_page+0x54/0x70 hv_page_online_one+0x5c/0x80 [hv_balloon] hot_add_req.cold.24+0x182/0x835 [hv_balloon] ... Turns out that we now have deferred struct page initialization for memory hotplug so e.g. memory_block_action() in drivers/base/memory.c does pages_correctly_probed() check and in that check it avoids inspecting struct pages and checks sections instead. But in Hyper-V balloon driver we do PageReserved(pfn_to_page()) check and this is now wrong. Switch to checking online_section_nr() instead. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Fulghum authored
commit fc01d8c6 upstream. Fix __might_sleep warning[1] in tty/n_hdlc.c read due to copy_to_user call while current is TASK_INTERRUPTIBLE. This is a false positive since the code path does not depend on current state remaining TASK_INTERRUPTIBLE. The loop breaks out and sets TASK_RUNNING after calling copy_to_user. This patch supresses the warning by setting TASK_RUNNING before calling copy_to_user. [1] https://syzkaller.appspot.com/bug?id=17d5de7f1fcab794cb8c40032f893f52de899324Signed-off-by: Paul Fulghum <paulkf@microgate.com> Reported-by: syzbot <syzbot+c244af085a0159d22879@syzkaller.appspotmail.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: stable <stable@vger.kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Samir Virmani authored
commit aff9cf59 upstream. We were experiencing a crash similar to the one reported as part of commit:a5ba1d95 ("uart: fix race between uart_put_char() and uart_shutdown()") in our testbed as well. We continue to observe the same crash after integrating the commit a5ba1d95 ("uart: fix race between uart_put_char() and uart_shutdown()") On reviewing the change, the port lock should be taken prior to checking for if (!circ->buf) in fn. __uart_put_char and other fns. that update the buffer uart_state->xmit. Traceback: [11/27/2018 06:24:32.4870] Unable to handle kernel NULL pointer dereference at virtual address 0000003b [11/27/2018 06:24:32.4950] PC is at memcpy+0x48/0x180 [11/27/2018 06:24:32.4950] LR is at uart_write+0x74/0x120 [11/27/2018 06:24:32.4950] pc : [<ffffffc0002e6808>] lr : [<ffffffc0003747cc>] pstate: 000001c5 [11/27/2018 06:24:32.4950] sp : ffffffc076433d30 [11/27/2018 06:24:32.4950] x29: ffffffc076433d30 x28: 0000000000000140 [11/27/2018 06:24:32.4950] x27: ffffffc0009b9d5e x26: ffffffc07ce36580 [11/27/2018 06:24:32.4950] x25: 0000000000000000 x24: 0000000000000140 [11/27/2018 06:24:32.4950] x23: ffffffc000891200 x22: ffffffc01fc34000 [11/27/2018 06:24:32.4950] x21: 0000000000000fff x20: 0000000000000076 [11/27/2018 06:24:32.4950] x19: 0000000000000076 x18: 0000000000000000 [11/27/2018 06:24:32.4950] x17: 000000000047cf08 x16: ffffffc000099e68 [11/27/2018 06:24:32.4950] x15: 0000000000000018 x14: 776d726966205948 [11/27/2018 06:24:32.4950] x13: 50203a6c6974755f x12: 74647075205d3333 [11/27/2018 06:24:32.4950] x11: 3a35323a36203831 x10: 30322f37322f3131 [11/27/2018 06:24:32.4950] x9 : 5b205d303638342e x8 : 746164206f742070 [11/27/2018 06:24:32.4950] x7 : 7520736920657261 x6 : 000000000000003b [11/27/2018 06:24:32.4950] x5 : 000000000000817a x4 : 0000000000000008 [11/27/2018 06:24:32.4950] x3 : 2f37322f31312a5b x2 : 000000000000006e [11/27/2018 06:24:32.4950] x1 : ffffffc0009b9cf0 x0 : 000000000000003b [11/27/2018 06:24:32.4950] CPU2: stopping [11/27/2018 06:24:32.4950] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P D O 4.1.51 #3 [11/27/2018 06:24:32.4950] Hardware name: Broadcom-v8A (DT) [11/27/2018 06:24:32.4950] Call trace: [11/27/2018 06:24:32.4950] [<ffffffc0000883b8>] dump_backtrace+0x0/0x150 [11/27/2018 06:24:32.4950] [<ffffffc00008851c>] show_stack+0x14/0x20 [11/27/2018 06:24:32.4950] [<ffffffc0005ee810>] dump_stack+0x90/0xb0 [11/27/2018 06:24:32.4950] [<ffffffc00008e844>] handle_IPI+0x18c/0x1a0 [11/27/2018 06:24:32.4950] [<ffffffc000080c68>] gic_handle_irq+0x88/0x90 Fixes: a5ba1d95 ("uart: fix race between uart_put_char() and uart_shutdown()") Cc: stable <stable@vger.kernel.org> Signed-off-by: Samir Virmani <samir@embedur.com> Acked-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kroah-Hartman authored
commit 27cfb3a5 upstream. Some tty line disciplines do not have a receive buf callback, so properly check for that before calling it. If they do not have this callback, just eat the character quietly, as we can't fail this call. Reported-by: Jann Horn <jannh@google.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Straube authored
commit 5f74a8cb upstream. This device was added to the stand-alone driver on github. Add it to the staging driver as well. Link: https://github.com/lwfinger/rtl8188eu/commit/a0619a07cd1eSigned-off-by: Michael Straube <straube.linux@gmail.com> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gustavo A. R. Silva authored
commit 701956d4 upstream. ipcnum is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/char/mwave/mwavedd.c:299 mwave_ioctl() warn: potential spectre issue 'pDrvData->IPCs' [w] (local cap) Fix this by sanitizing ipcnum before using it to index pDrvData->IPCs. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gerald Schaefer authored
commit b7cb707c upstream. smp_rescan_cpus() is called without the device_hotplug_lock, which can lead to a dedlock when a new CPU is found and immediately set online by a udev rule. This was observed on an older kernel version, where the cpu_hotplug_begin() loop was still present, and it resulted in hanging chcpu and systemd-udev processes. This specific deadlock will not show on current kernels. However, there may be other possible deadlocks, and since smp_rescan_cpus() can still trigger a CPU hotplug operation, the device_hotplug_lock should be held. For reference, this was the deadlock with the old cpu_hotplug_begin() loop: chcpu (rescan) systemd-udevd echo 1 > /sys/../rescan -> smp_rescan_cpus() -> (*) get_online_cpus() (increases refcount) -> smp_add_present_cpu() (new CPU found) -> register_cpu() -> device_add() -> udev "add" event triggered -----------> udev rule sets CPU online -> echo 1 > /sys/.../online -> lock_device_hotplug_sysfs() (this is missing in rescan path) -> device_online() -> (**) device_lock(new CPU dev) -> cpu_up() -> cpu_hotplug_begin() (loops until refcount == 0) -> deadlock with (*) -> bus_probe_device() -> device_attach() -> device_lock(new CPU dev) -> deadlock with (**) Fix this by taking the device_hotplug_lock in the CPU rescan path. Cc: <stable@vger.kernel.org> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christian Borntraeger authored
commit 03aa047e upstream. Right now the early machine detection code check stsi 3.2.2 for "KVM" and set MACHINE_IS_VM if this is different. As the console detection uses diagnose 8 if MACHINE_IS_VM returns true this will crash Linux early for any non z/VM system that sets a different value than KVM. So instead of assuming z/VM, do not set any of MACHINE_IS_LPAR, MACHINE_IS_VM, or MACHINE_IS_KVM. CC: stable@vger.kernel.org Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eugeniy Paltsev authored
commit 3affbf0e upstream. So far we've mapped branches to "ijmp" which also counts conditional branches NOT taken. This makes us different from other architectures such as ARM which seem to be counting only taken branches. So use "ijmptak" hardware condition which only counts (all jump instructions that are taken) 'ijmptak' event is available on both ARCompact and ARCv2 ISA based cores. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: reworked changelog] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eugeniy Paltsev authored
commit a3010a04 upstream. In setup_arch_memory we reserve the memory area wherein the kernel is located. Current implementation may reserve more memory than it actually required in case of CONFIG_LINUX_LINK_BASE is not equal to CONFIG_LINUX_RAM_BASE. This happens because we calculate start of the reserved region relatively to the CONFIG_LINUX_RAM_BASE and end of the region relatively to the CONFIG_LINUX_RAM_BASE. For example in case of HSDK board we wasted 256MiB of physical memory: ------------------->8------------------------------ Memory: 770416K/1048576K available (5496K kernel code, 240K rwdata, 1064K rodata, 2200K init, 275K bss, 278160K reserved, 0K cma-reserved) ------------------->8------------------------------ Fix that. Fixes: 9ed68785 ("ARC: mm: Decouple RAM base address from kernel link addr") Cc: stable@vger.kernel.org #4.14+ Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eugeniy Paltsev authored
commit e6a72b7d upstream. ARCv2 optimized memset uses PREFETCHW instruction for prefetching the next cache line but doesn't ensure that the line is not past the end of the buffer. PRETECHW changes the line ownership and marks it dirty, which can cause issues in SMP config when next line was already owned by other core. Fix the issue by avoiding the PREFETCHW Some more details: The current code has 3 logical loops (ignroing the unaligned part) (a) Big loop for doing aligned 64 bytes per iteration with PREALLOC (b) Loop for 32 x 2 bytes with PREFETCHW (c) any left over bytes loop (a) was already eliding the last 64 bytes, so PREALLOC was safe. The fix was removing PREFETCW from (b). Another potential issue (applicable to configs with 32 or 128 byte L1 cache line) is that PREALLOC assumes 64 byte cache line and may not do the right thing specially for 32b. While it would be easy to adapt, there are no known configs with those lie sizes, so for now, just compile out PREALLOC in such cases. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Cc: stable@vger.kernel.org #4.4+ Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: rewrote changelog, used asm .macro vs. "C" macro] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anthony Wong authored
commit 69939038 upstream. Support speaker and mic mute LEDs on HP ProBook 470 G5. BugLink: https://bugs.launchpad.net/bugs/1811254Signed-off-by: Anthony Wong <anthony.wong@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gustavo A. R. Silva authored
commit 060d0bf4 upstream. There is a potential NULL pointer dereference in case devm_kzalloc() fails and returns NULL. Fix this by adding a NULL check on rt5514_dsp. This issue was detected with the help of Coccinelle. Fixes: 6eebf35b ("ASoC: rt5514: add rt5514 SPI driver") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kangjie Lu authored
commit 44fabd8c upstream. snd_pcm_lib_malloc_pages() may fail, so let's check its status and return its error code upstream. Signed-off-by: Kangjie Lu <kjlu@umn.edu> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Charles Yeh authored
commit 4dcf9ddc upstream. Add new PID to support PL2303TB (TYPE_HX) Signed-off-by: Charles Yeh <charlesyeh522@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Max Schulze authored
commit b81c2c33 upstream. Add new Motorola Tetra device id for Motorola Solutions TETRA PEI device T: Bus=02 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0cad ProdID=9016 Rev=24.16 S: Manufacturer=Motorola Solutions, Inc. S: Product=TETRA PEI interface C: #Ifs= 2 Cfg#= 1 Atr=80 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple I: If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple Signed-off-by: Max Schulze <max.schulze@posteo.de> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tomas Winkler authored
commit f7ee8ead upstream. Add the Denverton innovation engine (IE) device ids. The IE is an ME-like device which provides HW security offloading. Cc: <stable@vger.kernel.org> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vijay Viswanath authored
commit 99d570da upstream. Enable CONFIG_MMC_SDHCI_IO_ACCESSORS so that SDHC controller specific register read and write APIs, if registered, can be used. Signed-off-by: Vijay Viswanath <vviswana@codeaurora.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Cc: Koen Vandeputte <koen.vandeputte@ncentric.com> Cc: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Abeni authored
[ Upstream commit f6f2a4a2 ] Setting the low threshold to 0 has no effect on frags allocation, we need to clear high_thresh instead. The code was pre-existent to commit 648700f7 ("inet: frags: use rhashtables for reassembly units"), but before the above, such assignment had a different role: prevent concurrent eviction from the worker and the netns cleanup helper. Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
[ Upstream commit 13d7f463 ] TCP transmission with MSG_ZEROCOPY fails if the peer closes its end of the connection and so transitions this socket to CLOSE_WAIT state. Transmission in close wait state is acceptable. Other similar tests in the stack (e.g., in FastOpen) accept both states. Relax this test, too. Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg276886.html Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg227390.html Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY") Reported-by: Marek Majkowski <marek@cloudflare.com> Signed-off-by: Willem de Bruijn <willemb@google.com> CC: Yuchung Cheng <ycheng@google.com> CC: Neal Cardwell <ncardwell@google.com> CC: Soheil Hassas Yeganeh <soheil@google.com> CC: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ido Schimmel authored
[ Upstream commit f97f4dd8 ] IPv4 routing tables are flushed in two cases: 1. In response to events in the netdev and inetaddr notification chains 2. When a network namespace is being dismantled In both cases only routes associated with a dead nexthop group are flushed. However, a nexthop group will only be marked as dead in case it is populated with actual nexthops using a nexthop device. This is not the case when the route in question is an error route (e.g., 'blackhole', 'unreachable'). Therefore, when a network namespace is being dismantled such routes are not flushed and leaked [1]. To reproduce: # ip netns add blue # ip -n blue route add unreachable 192.0.2.0/24 # ip netns del blue Fix this by not skipping error routes that are not marked with RTNH_F_DEAD when flushing the routing tables. To prevent the flushing of such routes in case #1, add a parameter to fib_table_flush() that indicates if the table is flushed as part of namespace dismantle or not. Note that this problem does not exist in IPv6 since error routes are associated with the loopback device. [1] unreferenced object 0xffff888066650338 (size 56): comm "ip", pid 1206, jiffies 4294786063 (age 26.235s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff ..........ba.... e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00 ...d............ backtrace: [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220 [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20 [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380 [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690 [<0000000014f62875>] netlink_sendmsg+0x929/0xe10 [<00000000bac9d967>] sock_sendmsg+0xc8/0x110 [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0 [<000000002e94f880>] __sys_sendmsg+0xf7/0x250 [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610 [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000003a8b605b>] 0xffffffffffffffff unreferenced object 0xffff888061621c88 (size 48): comm "ip", pid 1206, jiffies 4294786063 (age 26.235s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff kkkkkkkk..&_.... backtrace: [<00000000733609e3>] fib_table_insert+0x978/0x1500 [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220 [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20 [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380 [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690 [<0000000014f62875>] netlink_sendmsg+0x929/0xe10 [<00000000bac9d967>] sock_sendmsg+0xc8/0x110 [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0 [<000000002e94f880>] __sys_sendmsg+0xf7/0x250 [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610 [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000003a8b605b>] 0xffffffffffffffff Fixes: 8cced9ef ("[NETNS]: Enable routing configuration in non-initial namespace.") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Wang authored
[ Upstream commit cc5e7107 ] Vhost dirty page logging API is designed to sync through GPA. But we try to log GIOVA when device IOTLB is enabled. This is wrong and may lead to missing data after migration. To solve this issue, when logging with device IOTLB enabled, we will: 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to get HVA, for writable descriptor, get HVA through iovec. For used ring update, translate its GIOVA to HVA 2) traverse the GPA->HVA mapping to get the possible GPA and log through GPA. Pay attention this reverse mapping is not guaranteed to be unique, so we should log each possible GPA in this case. This fix the failure of scp to guest during migration. In -next, we will probably support passing GIOVA->GPA instead of GIOVA->HVA. Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Reported-by: Jintack Lim <jintack@cs.columbia.edu> Cc: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ross Lagerwall authored
[ Upstream commit 04a4af33 ] For nested and variable attributes, the expected length of an attribute is not known and marked by a negative number. This results in an OOB read when the expected length is later used to check if the attribute is all zeros. Fix this by using the actual length of the attribute rather than the expected length. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Cong Wang authored
[ Upstream commit cd0c4e70 ] Martin reported a set of filters don't work after changing from reclassify to continue. Looking into the code, it looks like skb protocol is not always fetched for each iteration of the filters. But, as demonstrated by Martin, TC actions could modify skb->protocol, for example act_vlan, this means we have to refetch skb protocol in each iteration, rather than using the one we fetch in the beginning of the loop. This bug is _not_ introduced by commit 3b3ae880 ("net: sched: consolidate tc_classify{,_compat}"), technically, if act_vlan is the only action that modifies skb protocol, then it is commit c7e2b968 ("sched: introduce vlan action") which introduced this bug. Reported-by: Martin Olsson <martin.olsson+netdev@sentorsecurity.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Petazzoni authored
[ Upstream commit e40e2a2e ] The current code in __mdiobus_register() doesn't properly handle failures returned by the devm_gpiod_get_optional() call: it returns immediately, without unregistering the device that was added by the call to device_register() earlier in the function. This leaves a stale device, which then causes a NULL pointer dereference in the code that handles deferred probing: [ 1.489982] Unable to handle kernel NULL pointer dereference at virtual address 00000074 [ 1.498110] pgd = (ptrval) [ 1.500838] [00000074] *pgd=00000000 [ 1.504432] Internal error: Oops: 17 [#1] SMP ARM [ 1.509133] Modules linked in: [ 1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99 [ 1.520708] Hardware name: Xilinx Zynq Platform [ 1.525261] Workqueue: events deferred_probe_work_func [ 1.530403] PC is at klist_next+0x10/0xfc [ 1.534403] LR is at device_for_each_child+0x40/0x94 [ 1.539361] pc : [<c0683fbc>] lr : [<c0455d90>] psr: 200e0013 [ 1.545628] sp : ceeefe68 ip : 00000001 fp : ffffe000 [ 1.550863] r10: 00000000 r9 : c0c66790 r8 : 00000000 [ 1.556079] r7 : c0457d44 r6 : 00000000 r5 : ceeefe8c r4 : cfa2ec78 [ 1.562604] r3 : 00000064 r2 : c0457d44 r1 : ceeefe8c r0 : 00000064 [ 1.569129] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 1.576263] Control: 18c5387d Table: 0ed7804a DAC: 00000051 [ 1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval)) [ 1.588280] Stack: (0xceeefe68 to 0xceef0000) [ 1.592630] fe60: cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790 [ 1.600814] fe80: 00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78 [ 1.608998] fea0: cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac [ 1.617182] fec0: cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10 [ 1.625366] fee0: c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c [ 1.633550] ff00: c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c [ 1.641734] ff20: cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00 [ 1.649936] ff40: cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000 [ 1.658120] ff60: ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4 [ 1.666304] ff80: cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000 [ 1.674488] ffa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000 [ 1.682671] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1.690855] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 1.699058] [<c0683fbc>] (klist_next) from [<c0455d90>] (device_for_each_child+0x40/0x94) [ 1.707241] [<c0455d90>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88) [ 1.716476] [<c0457d7c>] (device_reorder_to_tail) from [<c0455dac>] (device_for_each_child+0x5c/0x94) [ 1.725692] [<c0455dac>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88) [ 1.734927] [<c0457d7c>] (device_reorder_to_tail) from [<c0457df4>] (device_pm_move_to_tail+0x28/0x40) [ 1.744235] [<c0457df4>] (device_pm_move_to_tail) from [<c045a27c>] (deferred_probe_work_func+0x58/0x8c) [ 1.753746] [<c045a27c>] (deferred_probe_work_func) from [<c013893c>] (process_one_work+0x210/0x4fc) [ 1.762888] [<c013893c>] (process_one_work) from [<c0139a30>] (worker_thread+0x2a8/0x5c0) [ 1.771072] [<c0139a30>] (worker_thread) from [<c013e598>] (kthread+0x14c/0x154) [ 1.778482] [<c013e598>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) [ 1.785689] Exception stack(0xceeeffb0 to 0xceeefff8) [ 1.790739] ffa0: 00000000 00000000 00000000 00000000 [ 1.798923] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1.807107] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 1.813724] Code: e92d47f0 e1a05000 e8900048 e1a00003 (e5937010) [ 1.819844] ---[ end trace 3c2c0c8b65399ec9 ]--- The actual error that we had from devm_gpiod_get_optional() was -EPROBE_DEFER, due to the GPIO being provided by a driver that is probed later than the Ethernet controller driver. To fix this, we simply add the missing device_del() invocation in the error path. Fixes: 69226896 ("mdio_bus: Issue GPIO RESET to PHYs") Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ross Lagerwall authored
[ Upstream commit 6c57f045 ] In certain cases, pskb_trim_rcsum() may change skb pointers. Reinitialize header pointers afterwards to avoid potential use-after-frees. Add a note in the documentation of pskb_trim_rcsum(). Found by KASAN. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yunjian Wang authored
[ Upstream commit 28c1382f ] The skb header should be set to ethernet header before using is_skb_forwardable. Because the ethernet header length has been considered in is_skb_forwardable(including dev->hard_header_len length). To reproduce the issue: 1, add 2 ports on linux bridge br using following commands: $ brctl addbr br $ brctl addif br eth0 $ brctl addif br eth1 2, the MTU of eth0 and eth1 is 1500 3, send a packet(Data 1480, UDP 8, IP 20, Ethernet 14, VLAN 4) from eth0 to eth1 So the expect result is packet larger than 1500 cannot pass through eth0 and eth1. But currently, the packet passes through success, it means eth1's MTU limit doesn't take effect. Fixes: f6367b46 ("bridge: use is_skb_forwardable in forward path") Cc: bridge@lists.linux-foundation.org Cc: Nkolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lendacky, Thomas authored
[ Upstream commit 5ab3121b ] The XGBE hardware has support for performing MDIO operations using an MDIO command request. The driver mistakenly uses the mdio port address as the MDIO command request device address instead of the MDIO command request port address. Additionally, the driver does not properly check for and create a clause 45 MDIO command. Check the supplied MDIO register to determine if the request is a clause 45 operation (MII_ADDR_C45). For a clause 45 operation, extract the device address and register number from the supplied MDIO register and use them to set the MDIO command request device address and register number fields. For a clause 22 operation, the MDIO request device address is set to zero and the MDIO command request register number is set to the supplied MDIO register. In either case, the supplied MDIO port address is used as the MDIO command request port address. Fixes: 732f2ab7 ("amd-xgbe: Add support for MDIO attached PHYs") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 Jan, 2019 11 commits
-
-
Greg Kroah-Hartman authored
-
Corey Minyard authored
commit 7d6380cd upstream. The block number was not being compared right, it was off by one when checking the response. Some statistics wouldn't be incremented properly in some cases. Check to see if that middle-part messages always have 31 bytes of data. Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marc Zyngier authored
commit 3f7bb2ec upstream. The write to the status register is really an ACK for the HW, and should be treated as such by the driver. Let's move it to the irq_ack() callback, which will prevent people from moving it around in order to paper over other bugs. Fixes: 8c934095 ("PCI: dwc: Clear MSI interrupt status after it is handled, not before") Fixes: 7c5925af ("PCI: dwc: Move MSI IRQs allocation to IRQ domains hierarchical API") Link: https://lore.kernel.org/linux-pci/20181113225734.8026-1-marc.zyngier@arm.com/Reported-by: Trent Piepho <tpiepho@impinj.com> Tested-by: Niklas Cassel <niklas.cassel@linaro.org> Tested-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> Tested-by: Stanimir Varbanov <svarbanov@mm-sol.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zhenyu Wang authored
commit 51b00d85 upstream. This is to fix missed mmap range check on vGPU bar2 region and only allow to map vGPU allocated GMADDR range, which means user space should support sparse mmap to get proper offset for mmap vGPU aperture. And this takes care of actual pgoff in mmap request as original code always does from beginning of vGPU aperture. Fixes: 659643f7 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT") Cc: "Monroy, Rodrigo Axel" <rodrigo.axel.monroy@intel.com> Cc: "Orrala Contreras, Alfredo" <alfredo.orrala.contreras@intel.com> Cc: stable@vger.kernel.org # v4.10+ Reviewed-by: Hang Yuan <hang.yuan@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steve French authored
commit 7420451f upstream. allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the config for the build of cifs.ko if want to always prevent mounting with these less secure dialects. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Jeremy Allison <jra@samba.org> Cc: Alakesh Haloi <alakeshh@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Corey Minyard authored
commit bc48fa1b upstream. Realtek has some sort of "Virtual" IPMI device on the PCI bus as a KCS controller, but whatever it is, it's not one. Ignore it if seen. [ Commit 13d0b35c (ipmi_si: Move PCI setup to another file) from Linux 4.15-rc1 has not been back ported, so the PCI code is still in `drivers/char/ipmi/ipmi_si_intf.c`, requiring to apply the commit manually. This fixes a 100 s boot delay on the HP EliteDesk 705 G4 MT with Linux 4.14.94. ] Reported-by: Chris Chiu <chiu@endlessm.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Tested-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Scott Mayhew authored
commit c156618e upstream. The following deadlock can occur between a process waiting for a client to initialize in while walking the client list during nfsv4 server trunking detection and another process waiting for the nfs_clid_init_mutex so it can initialize that client: Process 1 Process 2 --------- --------- spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTA->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTB->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); mutex_lock(&nfs_clid_init_mutex); nfs41_walk_client_list(clp, result, cred); nfs_wait_client_init_complete(CLIENTA); (waiting for nfs_clid_init_mutex) Make sure nfs_match_client() only evaluates clients that have completed initialization in order to prevent that deadlock. This patch also fixes v4.0 trunking behavior by not marking the client NFS_CS_READY until the clientid has been confirmed. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Qian Lu <luqia@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Hocko authored
[ Upstream commit 7550c607 ] Patch series "THP eligibility reporting via proc". This series of three patches aims at making THP eligibility reporting much more robust and long term sustainable. The trigger for the change is a regression report [2] and the long follow up discussion. In short the specific application didn't have good API to query whether a particular mapping can be backed by THP so it has used VMA flags to workaround that. These flags represent a deep internal state of VMAs and as such they should be used by userspace with a great deal of caution. A similar has happened for [3] when users complained that VM_MIXEDMAP is no longer set on DAX mappings. Again a lack of a proper API led to an abuse. The first patch in the series tries to emphasise that that the semantic of flags might change and any application consuming those should be really careful. The remaining two patches provide a more suitable interface to address [2] and provide a consistent API to query the THP status both for each VMA and process wide as well. [1] http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz This patch (of 3): Even though vma flags exported via /proc/<pid>/smaps are explicitly documented to be not guaranteed for future compatibility the warning doesn't go far enough because it doesn't mention semantic changes to those flags. And they are important as well because these flags are a deep implementation internal to the MM code and the semantic might change at any time. Let's consider two recent examples: http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz : commit e1fb4a08 "dax: remove VM_MIXEDMAP for fsdax and device dax" has : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the : mean time certain customer of ours started poking into /proc/<pid>/smaps : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA : flags, the application just fails to start complaining that DAX support is : missing in the kernel. http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com : Commit 18600332 ("mm: make PR_SET_THP_DISABLE immediately active") : introduced a regression in that userspace cannot always determine the set : of vmas where thp is ineligible. : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps : to determine if a vma is eligible to be backed by hugepages. : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to : be disabled and emit "nh" as a flag for the corresponding vmas as part of : /proc/pid/smaps. After the commit, thp is disabled by means of an mm : flag and "nh" is not emitted. : This causes smaps parsing libraries to assume a vma is eligible for thp : and ends up puzzling the user on why its memory is not backed by thp. In both cases userspace was relying on a semantic of a specific VMA flag. The primary reason why that happened is a lack of a proper interface. While this has been worked on and it will be fixed properly, it seems that our wording could see some refinement and be more vocal about semantic aspect of these flags as well. Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Oppenheimer <bepvte@gmail.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Aaron Lu authored
[ Upstream commit 66f71da9 ] Since a2468cc9 ("swap: choose swap device according to numa node"), avail_lists field of swap_info_struct is changed to an array with MAX_NUMNODES elements. This made swap_info_struct size increased to 40KiB and needs an order-4 page to hold it. This is not optimal in that: 1 Most systems have way less than MAX_NUMNODES(1024) nodes so it is a waste of memory; 2 It could cause swapon failure if the swap device is swapped on after system has been running for a while, due to no order-4 page is available as pointed out by Vasily Averin. Solve the above two issues by using nr_node_ids(which is the actual possible node number the running system has) for avail_lists instead of MAX_NUMNODES. nr_node_ids is unknown at compile time so can't be directly used when declaring this array. What I did here is to declare avail_lists as zero element array and allocate space for it when allocating space for swap_info_struct. The reason why keep using array but not pointer is plist_for_each_entry needs the field to be part of the struct, so pointer will not work. This patch is on top of Vasily Averin's fix commit. I think the use of kvzalloc for swap_info_struct is still needed in case nr_node_ids is really big on some systems. Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.comSigned-off-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Brian Foster authored
[ Upstream commit 3fa750dc ] write_cache_pages() is used in both background and integrity writeback scenarios by various filesystems. Background writeback is mostly concerned with cleaning a certain number of dirty pages based on various mm heuristics. It may not write the full set of dirty pages or wait for I/O to complete. Integrity writeback is responsible for persisting a set of dirty pages before the writeback job completes. For example, an fsync() call must perform integrity writeback to ensure data is on disk before the call returns. write_cache_pages() unconditionally breaks out of its processing loop in the event of a ->writepage() error. This is fine for background writeback, which had no strict requirements and will eventually come around again. This can cause problems for integrity writeback on filesystems that might need to clean up state associated with failed page writeouts. For example, XFS performs internal delayed allocation accounting before returning a ->writepage() error, where applicable. If the current writeback happens to be associated with an unmount and write_cache_pages() completes the writeback prematurely due to error, the filesystem is unmounted in an inconsistent state if dirty+delalloc pages still exist. To handle this problem, update write_cache_pages() to always process the full set of pages for integrity writeback regardless of ->writepage() errors. Save the first encountered error and return it to the caller once complete. This facilitates XFS (or any other fs that expects integrity writeback to process the entire set of dirty pages) to clean up its internal state completely in the event of persistent mapping errors. Background writeback continues to exit on the first error encountered. [akpm@linux-foundation.org: fix typo in comment] Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.comSigned-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Junxiao Bi authored
[ Upstream commit 532e1e54 ] mount.ocfs2 ignore the inconsistent error that journal is clean but local alloc is unrecovered. After mount, local alloc not empty, then reserver cluster didn't alloc a new local alloc window, reserveration map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the following panic. This issue was reported at https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html and was advised to fixed during mount. But this is a very unusual inconsistent state, usually journal dirty flag should be cleared at the last stage of umount until every other things go right. We may need do further debug to check that. Any way to avoid possible futher corruption, mount should be abort and fsck should be run. (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered! found = 6518, set = 6518, taken = 8192, off = 15912372 ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode. o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode. o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777 o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes ------------[ cut here ]------------ kernel BUG at fs/ocfs2/reservations.c:507! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018 task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000 RIP: 0010:[<ffffffffa05e96a8>] [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2] Call Trace: ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2] ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2] __ocfs2_claim_clusters+0x178/0x360 [ocfs2] ocfs2_claim_clusters+0x1f/0x30 [ocfs2] ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2] ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2] ocfs2_write_begin+0x13e/0x230 [ocfs2] generic_perform_write+0xbf/0x1c0 __generic_file_write_iter+0x19c/0x1d0 ocfs2_file_write_iter+0x589/0x1360 [ocfs2] __vfs_write+0xb8/0x110 vfs_write+0xa9/0x1b0 SyS_write+0x46/0xb0 system_call_fastpath+0x18/0xd7 Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85 RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2] RSP <ffff8800ea4db668> ---[ end trace 566f07529f2edf3c ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.comSigned-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-