1. 11 May, 2016 21 commits
  2. 04 May, 2016 19 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.14.68 · 48763742
      Greg Kroah-Hartman authored
      48763742
    • NeilBrown's avatar
      sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race · 44a89080
      NeilBrown authored
      commit a6ab1e81 upstream.
      
      sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer
      set.  In this case it aborts the queuing of the upcall.
      However it has already taken a new counted reference on "h" and
      doesn't "put" it, even though it frees the data structure holding the reference.
      
      So let's delay the "cache_get" until we know we need it.
      
      Fixes: f9e1aedc ("sunrpc/cache: remove races with queuing an upcall.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44a89080
    • Guo-Fu Tseng's avatar
      jme: Fix device PM wakeup API usage · bfdaced9
      Guo-Fu Tseng authored
      commit 81422e67 upstream.
      
      According to Documentation/power/devices.txt
      
      The driver should not use device_set_wakeup_enable() which is the policy
      for user to decide.
      
      Using device_init_wakeup() to initialize dev->power.should_wakeup and
      dev->power.can_wakeup on driver initialization.
      
      And use device_may_wakeup() on suspend to decide if WoL function should
      be enabled on NIC.
      Reported-by: default avatarDiego Viola <diego.viola@gmail.com>
      Signed-off-by: default avatarGuo-Fu Tseng <cooldavid@cooldavid.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfdaced9
    • Guo-Fu Tseng's avatar
      jme: Do not enable NIC WoL functions on S0 · 560fd287
      Guo-Fu Tseng authored
      commit 0772a99b upstream.
      
      Otherwise it might be back on resume right after going to suspend in
      some hardware.
      Reported-by: default avatarDiego Viola <diego.viola@gmail.com>
      Signed-off-by: default avatarGuo-Fu Tseng <cooldavid@cooldavid.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      560fd287
    • Fabio Estevam's avatar
      bus: imx-weim: Take the 'status' property value into account · 85a3952d
      Fabio Estevam authored
      commit 33b96d2c upstream.
      
      Currently we have an incorrect behaviour when multiple devices
      are present under the weim node. For example:
      
      &weim {
      	...
      	status = "okay";
      
      	sram@0,0 {
      		...
              	status = "okay";
      	};
      
      	mram@0,0 {
      		...
              	status = "disabled";
          	};
      };
      
      In this case only the 'sram' device should be probed and not 'mram'.
      
      However what happens currently is that the status variable is ignored,
      causing the 'sram' device to be disabled and 'mram' to be enabled.
      
      Change the weim_parse_dt() function to use
      for_each_available_child_of_node()so that the devices marked with
      'status = disabled' are not probed.
      Suggested-by: default avatarWolfgang Netbal <wolfgang.netbal@sigmatek.at>
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Reviewed-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Acked-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85a3952d
    • Pali Rohár's avatar
      ARM: OMAP3: Add cpuidle parameters table for omap3430 · d37eea97
      Pali Rohár authored
      commit 98f42221 upstream.
      
      Based on CPU type choose generic omap3 or omap3430 specific cpuidle
      parameters. Parameters for omap3430 were measured on Nokia N900 device and
      added by commit 5a1b1d3a ("OMAP3: RX-51: Pass cpu idle parameters")
      which were later removed by commit 231900af ("ARM: OMAP3: cpuidle -
      remove rx51 cpuidle parameters table") due to huge code complexity.
      
      This patch brings cpuidle parameters for omap3430 devices again, but uses
      simple condition based on CPU type.
      
      Fixes: 231900af ("ARM: OMAP3: cpuidle - remove rx51 cpuidle
      parameters table")
      Signed-off-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d37eea97
    • Borislav Petkov's avatar
      perf stat: Document --detailed option · d601c53f
      Borislav Petkov authored
      commit f594bae0 upstream.
      
      I'm surprised this remained undocumented since at least 2011. And it is
      actually a very useful switch, as Steve and I came to realize recently.
      
      Add the text from
      
        2cba3ffb ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
      
      which added the incrementing aspect to -d.
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Davidlohr Bueso <dbueso@suse.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mel Gorman <mgorman@suse.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 2cba3ffb ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
      Link: http://lkml.kernel.org/r/1457347294-32546-1-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d601c53f
    • Vitaly Kuznetsov's avatar
      Drivers: hv: vmbus: prevent cpu offlining on newer hypervisors · f7aecad1
      Vitaly Kuznetsov authored
      commit e513229b upstream.
      
      When an SMP Hyper-V guest is running on top of 2012R2 Server and secondary
      cpus are sent offline (with echo 0 > /sys/devices/system/cpu/cpu$cpu/online)
      the system freeze is observed. This happens due to the fact that on newer
      hypervisors (Win8, WS2012R2, ...) vmbus channel handlers are distributed
      across all cpus (see init_vp_index() function in drivers/hv/channel_mgmt.c)
      and on cpu offlining nobody reassigns them to CPU0. Prevent cpu offlining
      when vmbus is loaded until the issue is fixed host-side.
      
      This patch also disables hibernation but it is OK as it is also broken (MCE
      error is hit on resume). Suspend still works.
      
      Tested with WS2008R2 and WS2012R2.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      [ 3chas3@gmail.com: rebase to 3.14-stable ]
      Signed-off-by: default avatarChas Williams <3chas3@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7aecad1
    • Vasily Kulikov's avatar
      include/linux/poison.h: fix LIST_POISON{1,2} offset · 703d87a5
      Vasily Kulikov authored
      commit 8a5e5e02 upstream.
      
      Poison pointer values should be small enough to find a room in
      non-mmap'able/hardly-mmap'able space.  E.g.  on x86 "poison pointer space"
      is located starting from 0x0.  Given unprivileged users cannot mmap
      anything below mmap_min_addr, it should be safe to use poison pointers
      lower than mmap_min_addr.
      
      The current poison pointer values of LIST_POISON{1,2} might be too big for
      mmap_min_addr values equal or less than 1 MB (common case, e.g.  Ubuntu
      uses only 0x10000).  There is little point to use such a big value given
      the "poison pointer space" below 1 MB is not yet exhausted.  Changing it
      to a smaller value solves the problem for small mmap_min_addr setups.
      
      The values are suggested by Solar Designer:
      http://www.openwall.com/lists/oss-security/2015/05/02/6Signed-off-by: default avatarVasily Kulikov <segoon@openwall.com>
      Cc: Solar Designer <solar@openwall.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      703d87a5
    • Geert Uytterhoeven's avatar
      serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock · 085dc0f0
      Geert Uytterhoeven authored
      commit ff1cab37 upstream.
      
      The BSP team noticed that there is spin/mutex lock issue on sh-sci when
      CPUFREQ is used.  The issue is that the notifier function may call
      mutex_lock() while the spinlock is held, which can lead to a BUG().
      This may happen if CPUFREQ is changed while another CPU calls
      clk_get_rate().
      
      Taking the spinlock was added to the notifier function in commit
      e552de24 ("sh-sci: add platform device private data"), to
      protect the list of serial ports against modification during traversal.
      At that time the Common Clock Framework didn't exist yet, and
      clk_get_rate() just returned clk->rate without taking a mutex.
      Note that since commit d535a230 ("serial: sh-sci: Require a
      device per port mapping."), there's no longer a list of serial ports to
      traverse, and taking the spinlock became superfluous.
      
      To fix the issue, just remove the cpufreq notifier:
        1. The notifier doesn't work correctly: all it does is update stored
           clock rates; it does not update the divider in the hardware.
           The divider will only be updated when calling sci_set_termios().
           I believe this was broken back in 2004, when the old
           drivers/char/sh-sci.c driver (where the notifier did update the
           divider) was replaced by drivers/serial/sh-sci.c (where the
           notifier just updated port->uartclk).
           Cfr. full-history-linux commits 6f8deaef2e9675d9 ("[PATCH] sh: port
           sh-sci driver to the new API") and 3f73fe878dc9210a ("[PATCH]
           Remove old sh-sci driver").
        2. On modern SoCs, the sh-sci parent clock rate is no longer related
           to the CPU clock rate anyway, so using a cpufreq notifier is
           futile.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      
      085dc0f0
    • Eryu Guan's avatar
      ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() · 49b00338
      Eryu Guan authored
      commit 5e1021f2 upstream.
      
      ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
      error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
      ignored in the following "if" condition and ext4_expand_extra_isize()
      might be called with NULL iloc.bh set, which triggers NULL pointer
      dereference.
      
      This is uncovered by commit 8b4953e1 ("ext4: reserve code points for
      the project quota feature"), which enlarges the ext4_inode size, and
      run the following script on new kernel but with old mke2fs:
      
        #/bin/bash
        mnt=/mnt/ext4
        devname=ext4-error
        dev=/dev/mapper/$devname
        fsimg=/home/fs.img
      
        trap cleanup 0 1 2 3 9 15
      
        cleanup()
        {
                umount $mnt >/dev/null 2>&1
                dmsetup remove $devname
                losetup -d $backend_dev
                rm -f $fsimg
                exit 0
        }
      
        rm -f $fsimg
        fallocate -l 1g $fsimg
        backend_dev=`losetup -f --show $fsimg`
        devsize=`blockdev --getsz $backend_dev`
      
        good_tab="0 $devsize linear $backend_dev 0"
        error_tab="0 $devsize error $backend_dev 0"
      
        dmsetup create $devname --table "$good_tab"
      
        mkfs -t ext4 $dev
        mount -t ext4 -o errors=continue,strictatime $dev $mnt
      
        dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
        echo 3 > /proc/sys/vm/drop_caches
        ls -l $mnt
        exit 0
      
      [ Patch changed to simplify the function a tiny bit. -- Ted ]
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49b00338
    • Michael Hennerich's avatar
      drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors · f3f0b32c
      Michael Hennerich authored
      commit f3df53e4 upstream.
      
      Fix RDAC read back errors caused by a typo. Value must shift by 2.
      
      Fixes: a4bd3949 ("drivers/misc/ad525x_dpot.c: new features")
      Signed-off-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3f0b32c
    • Geert Uytterhoeven's avatar
      rtc: vr41xx: Wire up alarm_irq_enable · 6e2e59c3
      Geert Uytterhoeven authored
      commit a25f4a95 upstream.
      
      drivers/rtc/rtc-vr41xx.c:229: warning: ‘vr41xx_rtc_alarm_irq_enable’ defined but not used
      
      Apparently the conversion to alarm_irq_enable forgot to wire up the
      callback.
      
      Fixes: 16380c15 ("RTC: Convert rtc drivers to use the alarm_irq_enable method")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e2e59c3
    • Alexander Kochetkov's avatar
      rtc: hym8563: fix invalid year calculation · 48257cde
      Alexander Kochetkov authored
      commit d5861262 upstream.
      
      Year field must be in BCD format, according to
      hym8563 datasheet.
      
      Due to the bug year 2016 became 2010.
      
      Fixes: dcaf0384 ("rtc: add hym8563 rtc-driver")
      Signed-off-by: default avatarAlexander Kochetkov <al.kochet@gmail.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48257cde
    • Ben Hutchings's avatar
      misc/bmp085: Enable building as a module · 314e5b76
      Ben Hutchings authored
      commit 50e6315d upstream.
      
      Commit 985087db 'misc: add support for bmp18x chips to the bmp085
      driver' changed the BMP085 config symbol to a boolean.  I see no
      reason why the shared code cannot be built as a module, so change it
      back to tristate.
      
      Fixes: 985087db ("misc: add support for bmp18x chips to the bmp085 driver")
      Cc: Eric Andersson <eric.andersson@unixphere.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      314e5b76
    • Sushaanth Srirangapathi's avatar
      fbdev: da8xx-fb: fix videomodes of lcd panels · 4d05780f
      Sushaanth Srirangapathi authored
      commit 713fced8 upstream.
      
      Commit 028cd86b ("video: da8xx-fb: fix the polarities of the
      hsync/vsync pulse") fixes polarities of HSYNC/VSYNC pulse but
      forgot to update known_lcd_panels[] which had sync values
      according to old logic. This breaks LCD at least on DA850 EVM.
      
      This patch fixes this issue and I have tested this for panel
      "Sharp_LK043T1DG01" using DA850 EVM board.
      
      Fixes: 028cd86b ("video: da8xx-fb: fix the polarities of the hsync/vsync pulse")
      Signed-off-by: default avatarSushaanth Srirangapathi <sushaanth.s@ti.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d05780f
    • Arnd Bergmann's avatar
      paride: make 'verbose' parameter an 'int' again · 2c88eab5
      Arnd Bergmann authored
      commit dec63a4d upstream.
      
      gcc-6.0 found an ancient bug in the paride driver, which had a
      "module_param(verbose, bool, 0);" since before 2.6.12, but actually uses
      it to accept '0', '1' or '2' as arguments:
      
        drivers/block/paride/pd.c: In function 'pd_init_dev_parms':
        drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare]
         #define DBMSG(msg) ((verbose>1)?(msg):NULL)
      
      In 2012, Rusty did a cleanup patch that also changed the type of the
      variable to 'bool', which introduced what is now a gcc warning.
      
      This changes the type back to 'int' and adapts the module_param() line
      instead, so it should work as documented in case anyone ever cares about
      running the ancient driver with debugging.
      
      Fixes: 90ab5ee9 ("module_param: make bool parameters really bool (drivers & misc)")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c88eab5
    • Ignat Korchagin's avatar
      USB: usbip: fix potential out-of-bounds write · c9104ee0
      Ignat Korchagin authored
      commit b348d7dd upstream.
      
      Fix potential out-of-bounds write to urb->transfer_buffer
      usbip handles network communication directly in the kernel. When receiving a
      packet from its peer, usbip code parses headers according to protocol. As
      part of this parsing urb->actual_length is filled. Since the input for
      urb->actual_length comes from the network, it should be treated as untrusted.
      Any entity controlling the network may put any value in the input and the
      preallocated urb->transfer_buffer may not be large enough to hold the data.
      Thus, the malicious entity is able to write arbitrary data to kernel memory.
      Signed-off-by: default avatarIgnat Korchagin <ignat.korchagin@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9104ee0
    • Roman Pen's avatar
      workqueue: fix ghost PENDING flag while doing MQ IO · 89c269f2
      Roman Pen authored
      commit 346c09f8 upstream.
      
      The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
      with the following backtrace:
      
      [  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
      [  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
      [  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
      [  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
      [  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
      [  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
      [  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
      [  601.350965] Call Trace:
      [  601.351203]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.351444]  [<ffffffff815b01d5>] schedule+0x35/0x80
      [  601.351709]  [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230
      [  601.351958]  [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220
      [  601.352208]  [<ffffffff810bd737>] ? ktime_get+0x37/0xa0
      [  601.352446]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.352688]  [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110
      [  601.352951]  [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
      [  601.353196]  [<ffffffff815b093b>] bit_wait_io+0x1b/0x70
      [  601.353440]  [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90
      [  601.353689]  [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0
      [  601.353958]  [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40
      [  601.354200]  [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140
      [  601.354441]  [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30
      [  601.354688]  [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70
      [  601.354932]  [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50
      [  601.355193]  [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0
      [  601.355432]  [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100
      [  601.355679]  [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0
      [  601.355925]  [<ffffffff81198379>] vfs_write+0xa9/0x1a0
      [  601.356164]  [<ffffffff811c59d8>] kernel_write+0x38/0x50
      
      The underlying device is a null_blk, with default parameters:
      
        queue_mode    = MQ
        submit_queues = 1
      
      Verification that nullb0 has something inflight:
      
      root@pserver8:~# cat /sys/block/nullb0/inflight
             0        1
      root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
      ...
      /sys/block/nullb0/mq/0/cpu2/rq_list
      CTX pending:
              ffff8838038e2400
      ...
      
      During debug it became clear that stalled request is always inserted in
      the rq_list from the following path:
      
         save_stack_trace_tsk + 34
         blk_mq_insert_requests + 231
         blk_mq_flush_plug_list + 281
         blk_flush_plug_list + 199
         wait_on_page_bit + 192
         __filemap_fdatawait_range + 228
         filemap_fdatawait_range + 20
         filemap_write_and_wait_range + 63
         blkdev_fsync + 27
         vfs_fsync_range + 73
         blkdev_write_iter + 202
         __vfs_write + 170
         vfs_write + 169
         kernel_write + 56
      
      So blk_flush_plug_list() was called with from_schedule == true.
      
      If from_schedule is true, that means that finally blk_mq_insert_requests()
      offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
      i.e. it calls kblockd_schedule_delayed_work_on().
      
      That means, that we race with another CPU, which is about to execute
      __blk_mq_run_hw_queue() work.
      
      Further debugging shows the following traces from different CPUs:
      
        CPU#0                                  CPU#1
        ----------------------------------     -------------------------------
        reqeust A inserted
        STORE hctx->ctx_map[0] bit marked
        kblockd_schedule...() returns 1
        <schedule to kblockd workqueue>
                                               request B inserted
                                               STORE hctx->ctx_map[1] bit marked
                                               kblockd_schedule...() returns 0
        *** WORK PENDING bit is cleared ***
        flush_busy_ctxs() is executed, but
        bit 1, set by CPU#1, is not observed
      
      As a result request B pended forever.
      
      This behaviour can be explained by speculative LOAD of hctx->ctx_map on
      CPU#0, which is reordered with clear of PENDING bit and executed _before_
      actual STORE of bit 1 on CPU#1.
      
      The proper fix is an explicit full barrier <mfence>, which guarantees
      that clear of PENDING bit is to be executed before all possible
      speculative LOADS or STORES inside actual work function.
      Signed-off-by: default avatarRoman Pen <roman.penyaev@profitbricks.com>
      Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
      Cc: Michael Wang <yun.wang@profitbricks.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89c269f2