1. 20 May, 2017 40 commits
    • Dan Williams's avatar
      libnvdimm, pfn: fix 'npfns' vs section alignment · af0eb80e
      Dan Williams authored
      commit d5483fed upstream.
      
      Fix failures to create namespaces due to the vmem_altmap not advertising
      enough free space to store the memmap.
      
       WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0
       [..]
       Call Trace:
        dump_stack+0x63/0x83
        __warn+0xcb/0xf0
        warn_slowpath_null+0x1d/0x20
        arch_add_memory+0xde/0xf0
        devm_memremap_pages+0x244/0x440
        pmem_attach_disk+0x37e/0x490 [nd_pmem]
        nd_pmem_probe+0x7e/0xa0 [nd_pmem]
        nvdimm_bus_probe+0x71/0x120 [libnvdimm]
        driver_probe_device+0x2bb/0x460
        bind_store+0x114/0x160
        drv_attr_store+0x25/0x30
      
      In commit 658922e5 "libnvdimm, pfn: fix memmap reservation sizing"
      we arranged for the capacity to be allocated, but failed to also update
      the 'npfns' parameter. This leads to cases where there is enough
      capacity reserved to hold all the allocated sections, but
      vmemmap_populate_hugepages() still encounters -ENOMEM from
      altmap_alloc_block_buf().
      
      This fix is a stop-gap until we can teach the core memory hotplug
      implementation to permit sub-section hotplug.
      
      Fixes: 658922e5 ("libnvdimm, pfn: fix memmap reservation sizing")
      Reported-by: default avatarAnisha Allada <anisha.allada@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af0eb80e
    • Dan Williams's avatar
      libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering · a3ff3ebd
      Dan Williams authored
      commit 452bae0a upstream.
      
      A debug patch to turn the standard device_lock() into something that
      lockdep can analyze yielded the following:
      
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       4.11.0-rc4+ #106 Tainted: G           O
       -------------------------------------------------------
       lt-libndctl/1898 is trying to acquire lock:
        (&dev->nvdimm_mutex/3){+.+.+.}, at: [<ffffffffc023c948>] nd_attach_ndns+0x178/0x1b0 [libnvdimm]
      
       but task is already holding lock:
        (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffc022e0b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
              lock_acquire+0xf6/0x1f0
              __mutex_lock+0x88/0x980
              mutex_lock_nested+0x1b/0x20
              nvdimm_bus_lock+0x21/0x30 [libnvdimm]
              nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm]
              nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm]
              nd_pmem_probe+0x14/0x180 [nd_pmem]
              nvdimm_bus_probe+0xa9/0x260 [libnvdimm]
      
       -> #0 (&dev->nvdimm_mutex/3){+.+.+.}:
              __lock_acquire+0x1107/0x1280
              lock_acquire+0xf6/0x1f0
              __mutex_lock+0x88/0x980
              mutex_lock_nested+0x1b/0x20
              nd_attach_ndns+0x178/0x1b0 [libnvdimm]
              nd_namespace_store+0x308/0x3c0 [libnvdimm]
              namespace_store+0x87/0x220 [libnvdimm]
      
      In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.
      
      Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
      nd_{attach,detach}_ndns() operations.
      
      Fixes: 8c2f7e86 ("libnvdimm: infrastructure for btt devices")
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3ff3ebd
    • Toshi Kani's avatar
      libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify · de21b800
      Toshi Kani authored
      commit b2518c78 upstream.
      
      The following BUG was observed when nd_pmem_notify() was called
      for a BTT device.  The use of a pmem_device pointer is not valid
      with BTT.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
       IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]
       Call Trace:
        nd_device_notify+0x40/0x50
        child_notify+0x10/0x20
        device_for_each_child+0x50/0x90
        nd_region_notify+0x20/0x30
        nd_device_notify+0x40/0x50
        nvdimm_region_notify+0x27/0x30
        acpi_nfit_scrub+0x341/0x590 [nfit]
        process_one_work+0x197/0x450
        worker_thread+0x4e/0x4a0
        kthread+0x109/0x140
      
      Fix nd_pmem_notify() by setting nd_region and badblocks pointers
      properly for BTT.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Fixes: 71999466 ("libnvdimm: async notification support")
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de21b800
    • Dan Williams's avatar
      libnvdimm, region: fix flush hint detection crash · d2572f5b
      Dan Williams authored
      commit bc042fdf upstream.
      
      In the case where a dimm does not have any associated flush hints the
      ndrd->flush_wpq array may be uninitialized leading to crashes with the
      following signature:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
       IP: region_visible+0x10f/0x160 [libnvdimm]
      
       Call Trace:
        internal_create_group+0xbe/0x2f0
        sysfs_create_groups+0x40/0x80
        device_add+0x2d8/0x650
        nd_async_device_register+0x12/0x40 [libnvdimm]
        async_run_entry_fn+0x39/0x170
        process_one_work+0x212/0x6c0
        ? process_one_work+0x197/0x6c0
        worker_thread+0x4e/0x4a0
        kthread+0x10c/0x140
        ? process_one_work+0x6c0/0x6c0
        ? kthread_create_on_node+0x60/0x60
        ret_from_fork+0x31/0x40
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Fixes: f284a4f2 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2572f5b
    • Joeseph Chang's avatar
      ipmi: Fix kernel panic at ipmi_ssif_thread() · d517be51
      Joeseph Chang authored
      commit 6de65fcf upstream.
      
      msg_written_handler() may set ssif_info->multi_data to NULL
      when using ipmitool to write fru.
      
      Before setting ssif_info->multi_data to NULL, add new local
      pointer "data_to_send" and store correct i2c data pointer to
      it to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos.
      Signed-off-by: default avatarJoeseph Chang <joechang@codeaurora.org>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d517be51
    • Christoph Hellwig's avatar
      libata: reject passthrough WRITE SAME requests · 8f0fde5b
      Christoph Hellwig authored
      commit c6ade20f upstream.
      
      The WRITE SAME to TRIM translation rewrites the DATA OUT buffer.  While
      the SCSI code accomodates for this by passing a read-writable buffer
      userspace applications don't cater for this behavior.  In fact it can
      be used to rewrite e.g. a readonly file through mmap and should be
      considered as a security fix.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f0fde5b
    • Tejun Heo's avatar
      cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc() · a5938c09
      Tejun Heo authored
      commit a590b90d upstream.
      
      cgroup_get() expected to be called only on live cgroups and triggers
      warning on a dead cgroup; however, cgroup_sk_alloc() may be called
      while cloning a socket which is left in an empty and removed cgroup
      and thus may legitimately duplicate its reference on a dead cgroup.
      This currently triggers the following warning spuriously.
      
       WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60
       ...
        [<ffffffff8107e123>] __warn+0xd3/0xf0
        [<ffffffff8107e20e>] warn_slowpath_null+0x1e/0x20
        [<ffffffff810ff465>] cgroup_get+0x55/0x60
        [<ffffffff81106061>] cgroup_sk_alloc+0x51/0xe0
        [<ffffffff81761beb>] sk_clone_lock+0x2db/0x390
        [<ffffffff817cce06>] inet_csk_clone_lock+0x16/0xc0
        [<ffffffff817e8173>] tcp_create_openreq_child+0x23/0x4b0
        [<ffffffff818601a1>] tcp_v6_syn_recv_sock+0x91/0x670
        [<ffffffff817e8b16>] tcp_check_req+0x3a6/0x4e0
        [<ffffffff81861ba3>] tcp_v6_rcv+0x693/0xa00
        [<ffffffff81837429>] ip6_input_finish+0x59/0x3e0
        [<ffffffff81837cb2>] ip6_input+0x32/0xb0
        [<ffffffff81837387>] ip6_rcv_finish+0x57/0xa0
        [<ffffffff81837ac8>] ipv6_rcv+0x318/0x4d0
        [<ffffffff817778c7>] __netif_receive_skb_core+0x2d7/0x9a0
        [<ffffffff81777fa6>] __netif_receive_skb+0x16/0x70
        [<ffffffff81778023>] netif_receive_skb_internal+0x23/0x80
        [<ffffffff817787d8>] napi_gro_frags+0x208/0x270
        [<ffffffff8168a9ec>] mlx4_en_process_rx_cq+0x74c/0xf40
        [<ffffffff8168b270>] mlx4_en_poll_rx_cq+0x30/0x90
        [<ffffffff81778b30>] net_rx_action+0x210/0x350
        [<ffffffff8188c426>] __do_softirq+0x106/0x2c7
        [<ffffffff81082bad>] irq_exit+0x9d/0xa0 [<ffffffff8188c0e4>] do_IRQ+0x54/0xd0
        [<ffffffff8188a63f>] common_interrupt+0x7f/0x7f <EOI>
        [<ffffffff8173d7e7>] cpuidle_enter+0x17/0x20
        [<ffffffff810bdfd9>] cpu_startup_entry+0x2a9/0x2f0
        [<ffffffff8103edd1>] start_secondary+0xf1/0x100
      
      This patch renames the existing cgroup_get() with the dead cgroup
      warning to cgroup_get_live() after cgroup_kn_lock_live() and
      introduces the new cgroup_get() which doesn't check whether the cgroup
      is live or dead.
      
      All existing cgroup_get() users except for cgroup_sk_alloc() are
      converted to use cgroup_get_live().
      
      Fixes: d979a39d ("cgroup: duplicate cgroup reference when cloning sockets")
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5938c09
    • Johan Hovold's avatar
      Bluetooth: hci_intel: add missing tty-device sanity check · 740f485d
      Johan Hovold authored
      commit dcb9cfaa upstream.
      
      Make sure to check the tty-device pointer before looking up the sibling
      platform device to avoid dereferencing a NULL-pointer when the tty is
      one end of a Unix98 pty.
      
      Fixes: 74cdad37 ("Bluetooth: hci_intel: Add runtime PM support")
      Fixes: 1ab1f239 ("Bluetooth: hci_intel: Add support for platform driver")
      Cc: Loic Poulain <loic.poulain@intel.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      740f485d
    • Johan Hovold's avatar
      Bluetooth: hci_bcm: add missing tty-device sanity check · a86af798
      Johan Hovold authored
      commit 95065a61 upstream.
      
      Make sure to check the tty-device pointer before looking up the sibling
      platform device to avoid dereferencing a NULL-pointer when the tty is
      one end of a Unix98 pty.
      
      Fixes: 0395ffc1 ("Bluetooth: hci_bcm: Add PM for BCM devices")
      Cc: Frederic Danis <frederic.danis@linux.intel.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a86af798
    • Szymon Janc's avatar
      Bluetooth: Fix user channel for 32bit userspace on 64bit kernel · ded39dd9
      Szymon Janc authored
      commit ab89f0bd upstream.
      
      Running 32bit userspace on 64bit kernel results in MSG_CMSG_COMPAT being
      defined as 0x80000000. This results in sendmsg failure if used from 32bit
      userspace running on 64bit kernel. Fix this by accounting for MSG_CMSG_COMPAT
      in flags check in hci_sock_sendmsg.
      Signed-off-by: default avatarSzymon Janc <szymon.janc@codecoup.pl>
      Signed-off-by: default avatarMarko Kiiskila <marko@runtime.io>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ded39dd9
    • Timur Tabi's avatar
      tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44 · 01f98533
      Timur Tabi authored
      commit 5a0722b8 upstream.
      
      Define a new early console name for Qualcomm Datacenter Technologies
      QDF2400 SOCs affected by erratum 44, instead of piggy-backing on "pl011".
      Previously, to enable traditional (non-SPCR) earlycon, the documentation
      said to specify "earlycon=pl011,<address>,qdf2400_e44", but the code was
      broken and this didn't actually work.
      
      So instead, the method for specifying the E44 work-around with traditional
      earlycon is "earlycon=qdf2400_e44,<address>".  Both methods of earlycon
      are now enabled with the same function.
      
      Fixes: e53e597f ("tty: pl011: fix earlycon work-around for QDF2400 erratum 44")
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Tested-by: default avatarShanker Donthineni <shankerd@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01f98533
    • Wang YanQing's avatar
      tty: pty: Fix ldisc flush after userspace become aware of the data already · 34e01f92
      Wang YanQing authored
      commit 77dae613 upstream.
      
      While using emacs, cat or others' commands in konsole with recent
      kernels, I have met many times that CTRL-C freeze konsole. After
      konsole freeze I can't type anything, then I have to open a new one,
      it is very annoying.
      
      See bug report:
      https://bugs.kde.org/show_bug.cgi?id=175283
      
      The platform in that bug report is Solaris, but now the pty in linux
      has the same problem or the same behavior as Solaris :)
      
      It has high possibility to trigger the problem follow steps below:
      Note: In my test, BigFile is a text file whose size is bigger than 1G
      1:open konsole
      1:cat BigFile
      2:CTRL-C
      
      After some digging, I find out the reason is that commit 1d1d14da
      ("pty: Fix buffer flush deadlock") changes the behavior of pty_flush_buffer.
      
      Thread A                                 Thread B
      --------                                 --------
      1:n_tty_poll return POLLIN
                                               2:CTRL-C trigger pty_flush_buffer
                                                   tty_buffer_flush
                                                     n_tty_flush_buffer
      3:attempt to check count of chars:
        ioctl(fd, TIOCINQ, &available)
        available is equal to 0
      
      4:read(fd, buffer, avaiable)
        return 0
      
      5:konsole close fd
      
      Yes, I know we could use the same patch included in the BUG report as
      a workaround for linux platform too. But I think the data in ldisc is
      belong to application of another side, we shouldn't clear it when we
      want to flush write buffer of this side in pty_flush_buffer. So I think
      it is better to disable ldisc flush in pty_flush_buffer, because its new
      hehavior bring no benefit except that it mess up the behavior between
      POLLIN, and TIOCINQ or FIONREAD.
      
      Also I find no flush_buffer function in others' tty driver has the
      same behavior as current pty_flush_buffer.
      
      Fixes: 1d1d14da ("pty: Fix buffer flush deadlock")
      Signed-off-by: default avatarWang YanQing <udknight@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34e01f92
    • Johan Hovold's avatar
      serial: omap: suspend device on probe errors · aad32798
      Johan Hovold authored
      commit 77e6fe7f upstream.
      
      Make sure to actually suspend the device before returning after a failed
      (or deferred) probe.
      
      Note that autosuspend must be disabled before runtime pm is disabled in
      order to balance the usage count due to a negative autosuspend delay as
      well as to make the final put suspend the device synchronously.
      
      Fixes: 388bc262 ("omap-serial: Fix the error handling in the omap_serial probe")
      Cc: Shubhrajyoti D <shubhrajyoti@ti.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aad32798
    • Johan Hovold's avatar
      serial: omap: fix runtime-pm handling on unbind · 9e85b5c7
      Johan Hovold authored
      commit 099bd73d upstream.
      
      An unbalanced and misplaced synchronous put was used to suspend the
      device on driver unbind, something which with a likewise misplaced
      pm_runtime_disable leads to external aborts when an open port is being
      removed.
      
      Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa024010
      ...
      [<c046e760>] (serial_omap_set_mctrl) from [<c046a064>] (uart_update_mctrl+0x50/0x60)
      [<c046a064>] (uart_update_mctrl) from [<c046a400>] (uart_shutdown+0xbc/0x138)
      [<c046a400>] (uart_shutdown) from [<c046bd2c>] (uart_hangup+0x94/0x190)
      [<c046bd2c>] (uart_hangup) from [<c045b760>] (__tty_hangup+0x404/0x41c)
      [<c045b760>] (__tty_hangup) from [<c045b794>] (tty_vhangup+0x1c/0x20)
      [<c045b794>] (tty_vhangup) from [<c046ccc8>] (uart_remove_one_port+0xec/0x260)
      [<c046ccc8>] (uart_remove_one_port) from [<c046ef4c>] (serial_omap_remove+0x40/0x60)
      [<c046ef4c>] (serial_omap_remove) from [<c04845e8>] (platform_drv_remove+0x34/0x4c)
      
      Fix this up by resuming the device before deregistering the port and by
      suspending and disabling runtime pm only after the port has been
      removed.
      
      Also make sure to disable autosuspend before disabling runtime pm so
      that the usage count is balanced and device actually suspended before
      returning.
      
      Note that due to a negative autosuspend delay being set in probe, the
      unbalanced put would actually suspend the device on first driver unbind,
      while rebinding and again unbinding would result in a negative
      power.usage_count.
      
      Fixes: 7e9c8e7d ("serial: omap: make sure to suspend device before remove")
      Cc: Felipe Balbi <balbi@kernel.org>
      Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e85b5c7
    • Marek Szyprowski's avatar
      serial: samsung: Add missing checks for dma_map_single failure · cd823aa0
      Marek Szyprowski authored
      commit 500fcc08 upstream.
      
      This patch adds missing checks for dma_map_single() failure and proper error
      reporting. Although this issue was harmless on ARM architecture, it is always
      good to use the DMA mapping API in a proper way. This patch fixes the following
      DMA API debug warning:
      
      WARNING: CPU: 1 PID: 3785 at lib/dma-debug.c:1171 check_unmap+0x8a0/0xf28
      dma-pl330 121a0000.pdma: DMA-API: device driver failed to check map error[device address=0x000000006e0f9000] [size=4096 bytes] [mapped as single]
      Modules linked in:
      CPU: 1 PID: 3785 Comm: (agetty) Tainted: G        W       4.11.0-rc1-00137-g07ca963-dirty #59
      Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      [<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
      [<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
      [<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
      [<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
      [<c01395a4>] (warn_slowpath_fmt) from [<c072a114>] (check_unmap+0x8a0/0xf28)
      [<c072a114>] (check_unmap) from [<c072a834>] (debug_dma_unmap_page+0x98/0xc8)
      [<c072a834>] (debug_dma_unmap_page) from [<c0803874>] (s3c24xx_serial_shutdown+0x314/0x52c)
      [<c0803874>] (s3c24xx_serial_shutdown) from [<c07f5124>] (uart_port_shutdown+0x54/0x88)
      [<c07f5124>] (uart_port_shutdown) from [<c07f522c>] (uart_shutdown+0xd4/0x110)
      [<c07f522c>] (uart_shutdown) from [<c07f6a8c>] (uart_hangup+0x9c/0x208)
      [<c07f6a8c>] (uart_hangup) from [<c07c426c>] (__tty_hangup+0x49c/0x634)
      [<c07c426c>] (__tty_hangup) from [<c07c78ac>] (tty_ioctl+0xc88/0x16e4)
      [<c07c78ac>] (tty_ioctl) from [<c03b5f2c>] (do_vfs_ioctl+0xc4/0xd10)
      [<c03b5f2c>] (do_vfs_ioctl) from [<c03b6bf4>] (SyS_ioctl+0x7c/0x8c)
      [<c03b6bf4>] (SyS_ioctl) from [<c010b4a0>] (ret_fast_syscall+0x0/0x3c)
      Reported-by: default avatarSeung-Woo Kim <sw0312.kim@samsung.com>
      Fixes: 62c37eed ("serial: samsung: add dma reqest/release functions")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Reviewed-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd823aa0
    • Marek Szyprowski's avatar
      serial: samsung: Use right device for DMA-mapping calls · 43a1c1ff
      Marek Szyprowski authored
      commit 768d64f4 upstream.
      
      Driver should provide its own struct device for all DMA-mapping calls instead
      of extracting device pointer from DMA engine channel. Although this is harmless
      from the driver operation perspective on ARM architecture, it is always good
      to use the DMA mapping API in a proper way. This patch fixes following DMA API
      debug warning:
      
      WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1241 check_sync+0x520/0x9f4
      samsung-uart 12c20000.serial: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000006df0f580] [size=64 bytes]
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00137-g07ca963 #51
      Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      [<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
      [<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
      [<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
      [<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
      [<c01395a4>] (warn_slowpath_fmt) from [<c0729058>] (check_sync+0x520/0x9f4)
      [<c0729058>] (check_sync) from [<c072967c>] (debug_dma_sync_single_for_device+0x88/0xc8)
      [<c072967c>] (debug_dma_sync_single_for_device) from [<c0803c10>] (s3c24xx_serial_start_tx_dma+0x100/0x2f8)
      [<c0803c10>] (s3c24xx_serial_start_tx_dma) from [<c0804338>] (s3c24xx_serial_tx_chars+0x198/0x33c)
      Reported-by: default avatarSeung-Woo Kim <sw0312.kim@samsung.com>
      Fixes: 62c37eed ("serial: samsung: add dma reqest/release functions")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43a1c1ff
    • Eric Biggers's avatar
      fscrypt: avoid collisions when presenting long encrypted filenames · ab62f411
      Eric Biggers authored
      commit 6b06cdee upstream.
      
      When accessing an encrypted directory without the key, userspace must
      operate on filenames derived from the ciphertext names, which contain
      arbitrary bytes.  Since we must support filenames as long as NAME_MAX,
      we can't always just base64-encode the ciphertext, since that may make
      it too long.  Currently, this is solved by presenting long names in an
      abbreviated form containing any needed filesystem-specific hashes (e.g.
      to identify a directory block), then the last 16 bytes of ciphertext.
      This needs to be sufficient to identify the actual name on lookup.
      
      However, there is a bug.  It seems to have been assumed that due to the
      use of a CBC (ciphertext block chaining)-based encryption mode, the last
      16 bytes (i.e. the AES block size) of ciphertext would depend on the
      full plaintext, preventing collisions.  However, we actually use CBC
      with ciphertext stealing (CTS), which handles the last two blocks
      specially, causing them to appear "flipped".  Thus, it's actually the
      second-to-last block which depends on the full plaintext.
      
      This caused long filenames that differ only near the end of their
      plaintexts to, when observed without the key, point to the wrong inode
      and be undeletable.  For example, with ext4:
      
          # echo pass | e4crypt add_key -p 16 edir/
          # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
          # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
          100000
          # sync
          # echo 3 > /proc/sys/vm/drop_caches
          # keyctl new_session
          # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
          2004
          # rm -rf edir/
          rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning
          ...
      
      To fix this, when presenting long encrypted filenames, encode the
      second-to-last block of ciphertext rather than the last 16 bytes.
      
      Although it would be nice to solve this without depending on a specific
      encryption mode, that would mean doing a cryptographic hash like SHA-256
      which would be much less efficient.  This way is sufficient for now, and
      it's still compatible with encryption modes like HEH which are strong
      pseudorandom permutations.  Also, changing the presented names is still
      allowed at any time because they are only provided to allow applications
      to do things like delete encrypted directories.  They're not designed to
      be used to persistently identify files --- which would be hard to do
      anyway, given that they're encrypted after all.
      
      For ease of backports, this patch only makes the minimal fix to both
      ext4 and f2fs.  It leaves ubifs as-is, since ubifs doesn't compare the
      ciphertext block yet.  Follow-on patches will clean things up properly
      and make the filesystems use a shared helper function.
      
      Fixes: 5de0b4d0 ("ext4 crypto: simplify and speed up filename encryption")
      Reported-by: default avatarGwendal Grignou <gwendal@chromium.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab62f411
    • Eric Biggers's avatar
      fscrypt: fix context consistency check when key(s) unavailable · eb86b4c6
      Eric Biggers authored
      commit 272f98f6 upstream.
      
      To mitigate some types of offline attacks, filesystem encryption is
      designed to enforce that all files in an encrypted directory tree use
      the same encryption policy (i.e. the same encryption context excluding
      the nonce).  However, the fscrypt_has_permitted_context() function which
      enforces this relies on comparing struct fscrypt_info's, which are only
      available when we have the encryption keys.  This can cause two
      incorrect behaviors:
      
      1. If we have the parent directory's key but not the child's key, or
         vice versa, then fscrypt_has_permitted_context() returned false,
         causing applications to see EPERM or ENOKEY.  This is incorrect if
         the encryption contexts are in fact consistent.  Although we'd
         normally have either both keys or neither key in that case since the
         master_key_descriptors would be the same, this is not guaranteed
         because keys can be added or removed from keyrings at any time.
      
      2. If we have neither the parent's key nor the child's key, then
         fscrypt_has_permitted_context() returned true, causing applications
         to see no error (or else an error for some other reason).  This is
         incorrect if the encryption contexts are in fact inconsistent, since
         in that case we should deny access.
      
      To fix this, retrieve and compare the fscrypt_contexts if we are unable
      to set up both fscrypt_infos.
      
      While this slightly hurts performance when accessing an encrypted
      directory tree without the key, this isn't a case we really need to be
      optimizing for; access *with* the key is much more important.
      Furthermore, the performance hit is barely noticeable given that we are
      already retrieving the fscrypt_context and doing two keyring searches in
      fscrypt_get_encryption_info().  If we ever actually wanted to optimize
      this case we might start by caching the fscrypt_contexts.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb86b4c6
    • Linus Torvalds's avatar
      initramfs: avoid "label at end of compound statement" error · 602f8911
      Linus Torvalds authored
      commit 394e4f5d upstream.
      
      Commit 17a9be31 ("initramfs: Always do fput() and load modules after
      rootfs populate") introduced an error for the
      
          CONFIG_BLK_DEV_RAM=y
      
      case, because even though the code looks fine, the compiler really wants
      a statement after a label, or you'll get complaints:
      
        init/initramfs.c: In function 'populate_rootfs':
        init/initramfs.c:644:2: error: label at end of compound statement
      
      That commit moved the subsequent statements to outside the compound
      statement, leaving the label without any associated statements.
      Reported-by: default avatarJörg Otte <jrg.otte@gmail.com>
      Fixes: 17a9be31 ("initramfs: Always do fput() and load modules after rootfs populate")
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      602f8911
    • Stafford Horne's avatar
      initramfs: Always do fput() and load modules after rootfs populate · e54af78e
      Stafford Horne authored
      commit 17a9be31 upstream.
      
      In OpenRISC we do not have a bootloader passed initrd, but the built in
      initramfs does contain the /init and other binaries, including modules.
      The previous commit 08865514 ("initramfs: finish fput() before
      accessing any binary from initramfs") made a change to only call fput()
      if the bootloader initrd was available, this caused intermittent crashes
      for OpenRISC.
      
      This patch changes the fput() to happen unconditionally if any rootfs is
      loaded. Also, I added some comments to make it a bit more clear why we
      call unpack_to_rootfs() multiple times.
      
      Fixes: 08865514 ("initramfs: finish fput() before accessing any binary from initramfs")
      Cc: Lokesh Vutla <lokeshvutla@ti.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarStafford Horne <shorne@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e54af78e
    • Jan Kara's avatar
      f2fs: Make flush bios explicitely sync · 9b3be66c
      Jan Kara authored
      commit 3adc5fcb upstream.
      
      Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
      definitions.  generic_make_request_checks() however strips REQ_FUA and
      REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
      write cache and thus write effectively becomes asynchronous which can
      lead to performance regressions.
      
      Fix the problem by making sure all bios which are synchronous are
      properly marked with REQ_SYNC.
      
      Fixes: b685d3d6
      CC: Jaegeuk Kim <jaegeuk@kernel.org>
      CC: linux-f2fs-devel@lists.sourceforge.net
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b3be66c
    • Jaegeuk Kim's avatar
      f2fs: check entire encrypted bigname when finding a dentry · bd3dfe50
      Jaegeuk Kim authored
      commit 6332cd32 upstream.
      
      If user has no key under an encrypted dir, fscrypt gives digested dentries.
      Previously, when looking up a dentry, f2fs only checks its hash value with
      first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
      This patch enhances to check entire dentry bytes likewise ext4.
      
      Eric reported how to reproduce this issue by:
      
       # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
       # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
      100000
       # sync
       # echo 3 > /proc/sys/vm/drop_caches
       # keyctl new_session
       # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
      99999
      Reported-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      (fixed f2fs_dentry_hash() to work even when the hash is 0)
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd3dfe50
    • Sheng Yong's avatar
      f2fs: fix multiple f2fs_add_link() having same name for inline dentry · 3a625468
      Sheng Yong authored
      commit d3bb910c upstream.
      
      Commit 88c5c13a (f2fs: fix multiple f2fs_add_link() calls having
      same name) does not cover the scenario where inline dentry is enabled.
      In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will
      lookup dentries one more time.
      
      This patch fixes it by moving the assigment of current task to a upper
      level to cover both normal and inline dentry.
      
      Fixes: 88c5c13a (f2fs: fix multiple f2fs_add_link() calls having same name)
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a625468
    • Jaegeuk Kim's avatar
      f2fs: fix fs corruption due to zero inode page · e71f0996
      Jaegeuk Kim authored
      commit 9bb02c36 upstream.
      
      This patch fixes the following scenario.
      
      - f2fs_create/f2fs_mkdir             - write_checkpoint
       - f2fs_mark_inode_dirty_sync         - block_operations
                                             - f2fs_lock_all
                                             - f2fs_sync_inode_meta
                                              - f2fs_unlock_all
                                              - sync_inode_metadata
       - f2fs_lock_op
                                               - f2fs_write_inode
                                                - update_inode_page
                                                 - get_node_page
                                                   return -ENOENT
       - new_inode_page
        - fill_node_footer
       - f2fs_mark_inode_dirty_sync
       - ...
       - f2fs_unlock_op
                                                - f2fs_inode_synced
                                             - f2fs_lock_all
                                             - do_checkpoint
      
      In this checkpoint, we can get an inode page which contains zeros having valid
      node footer only.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e71f0996
    • Jaegeuk Kim's avatar
      Revert "f2fs: put allocate_segment after refresh_sit_entry" · ed4d26a1
      Jaegeuk Kim authored
      commit c6f82fe9 upstream.
      
      This reverts commit 3436c4bd.
      
      This makes a leak to register dirty segments. I reproduced the issue by
      modified postmark which injects a lot of file create/delete/update and
      finally triggers huge number of SSR allocations.
      
      [Jaegeuk Kim: Change missing incorrect comment]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed4d26a1
    • Jaegeuk Kim's avatar
      f2fs: fix wrong max cost initialization · 33cbcc25
      Jaegeuk Kim authored
      commit c541a51b upstream.
      
      This patch fixes missing increased max cost caused by a patch that we increased
      cose of data segments in greedy algorithm.
      
      Fixes: b9cd2061 "f2fs: node segment is prior to data segment selected victim"
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33cbcc25
    • Ross Zwisler's avatar
      dax: fix PMD data corruption when fault races with write · 03606c8e
      Ross Zwisler authored
      commit 876f2946 upstream.
      
      This is based on a patch from Jan Kara that fixed the equivalent race in
      the DAX PTE fault path.
      
      Currently DAX PMD read fault can race with write(2) in the following
      way:
      
      CPU1 - write(2)                 CPU2 - read fault
                                      dax_iomap_pmd_fault()
                                        ->iomap_begin() - sees hole
      
      dax_iomap_rw()
        iomap_apply()
          ->iomap_begin - allocates blocks
          dax_iomap_actor()
            invalidate_inode_pages2_range()
              - there's nothing to invalidate
      
                                        grab_mapping_entry()
      				  - we add huge zero page to the radix tree
      				    and map it to page tables
      
      The result is that hole page is mapped into page tables (and thus zeros
      are seen in mmap) while file has data written in that place.
      
      Fix the problem by locking exception entry before mapping blocks for the
      fault.  That way we are sure invalidate_inode_pages2_range() call for
      racing write will either block on entry lock waiting for the fault to
      finish (and unmap stale page tables after that) or read fault will see
      already allocated blocks by write(2).
      
      Fixes: 9f141d6e ("dax: Call ->iomap_begin without entry lock during dax fault")
      Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03606c8e
    • Jan Kara's avatar
      ext4: return to starting transaction in ext4_dax_huge_fault() · 5a3651b4
      Jan Kara authored
      commit fb26a1cb upstream.
      
      DAX will return to locking exceptional entry before mapping blocks for a
      page fault to fix possible races with concurrent writes.  To avoid lock
      inversion between exceptional entry lock and transaction start, start
      the transaction already in ext4_dax_huge_fault().
      
      Fixes: 9f141d6e
      Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a3651b4
    • Jan Kara's avatar
      mm: fix data corruption due to stale mmap reads · 85f353db
      Jan Kara authored
      commit cd656375 upstream.
      
      Currently, we didn't invalidate page tables during invalidate_inode_pages2()
      for DAX.  That could result in e.g. 2MiB zero page being mapped into
      page tables while there were already underlying blocks allocated and
      thus data seen through mmap were different from data seen by read(2).
      The following sequence reproduces the problem:
      
       - open an mmap over a 2MiB hole
      
       - read from a 2MiB hole, faulting in a 2MiB zero page
      
       - write to the hole with write(3p). The write succeeds but we
         incorrectly leave the 2MiB zero page mapping intact.
      
       - via the mmap, read the data that was just written. Since the zero
         page mapping is still intact we read back zeroes instead of the new
         data.
      
      Fix the problem by unconditionally calling invalidate_inode_pages2_range()
      in dax_iomap_actor() for new block allocations and by properly
      invalidating page tables in invalidate_inode_pages2_range() for DAX
      mappings.
      
      Fixes: c6dcf52c
      Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.czSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85f353db
    • Ross Zwisler's avatar
      dax: prevent invalidation of mapped DAX entries · 06305f51
      Ross Zwisler authored
      commit 4636e70b upstream.
      
      Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
      v4.
      
      This series fixes data corruption that can happen for DAX mounts when
      page faults race with write(2) and as a result page tables get out of
      sync with block mappings in the filesystem and thus data seen through
      mmap is different from data seen through read(2).
      
      The series passes testing with t_mmap_stale test program from Ross and
      also other mmap related tests on DAX filesystem.
      
      This patch (of 4):
      
      dax_invalidate_mapping_entry() currently removes DAX exceptional entries
      only if they are clean and unlocked.  This is done via:
      
        invalidate_mapping_pages()
          invalidate_exceptional_entry()
            dax_invalidate_mapping_entry()
      
      However, for page cache pages removed in invalidate_mapping_pages()
      there is an additional criteria which is that the page must not be
      mapped.  This is noted in the comments above invalidate_mapping_pages()
      and is checked in invalidate_inode_page().
      
      For DAX entries this means that we can can end up in a situation where a
      DAX exceptional entry, either a huge zero page or a regular DAX entry,
      could end up mapped but without an associated radix tree entry.  This is
      inconsistent with the rest of the DAX code and with what happens in the
      page cache case.
      
      We aren't able to unmap the DAX exceptional entry because according to
      its comments invalidate_mapping_pages() isn't allowed to block, and
      unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
      
      Since we essentially never have unmapped DAX entries to evict from the
      radix tree, just remove dax_invalidate_mapping_entry().
      
      Fixes: c6dcf52c ("mm: Invalidate DAX radix tree entries only if appropriate")
      Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.czSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06305f51
    • Dan Williams's avatar
      device-dax: fix sysfs attribute deadlock · 55dbfd7d
      Dan Williams authored
      commit 565851c9 upstream.
      
      Usage of device_lock() for dax_region attributes is unnecessary and
      deadlock prone. It's unnecessary because the order of registration /
      un-registration guarantees that drvdata is always valid. It's deadlock
      prone because it sets up this situation:
      
       ndctl           D    0  2170   2082 0x00000000
       Call Trace:
        __schedule+0x31f/0x980
        schedule+0x3d/0x90
        schedule_preempt_disabled+0x15/0x20
        __mutex_lock+0x402/0x980
        ? __mutex_lock+0x158/0x980
        ? align_show+0x2b/0x80 [dax]
        ? kernfs_seq_start+0x2f/0x90
        mutex_lock_nested+0x1b/0x20
        align_show+0x2b/0x80 [dax]
        dev_attr_show+0x20/0x50
      
       ndctl           D    0  2186   2079 0x00000000
       Call Trace:
        __schedule+0x31f/0x980
        schedule+0x3d/0x90
        __kernfs_remove+0x1f6/0x340
        ? kernfs_remove_by_name_ns+0x45/0xa0
        ? remove_wait_queue+0x70/0x70
        kernfs_remove_by_name_ns+0x45/0xa0
        remove_files.isra.1+0x35/0x70
        sysfs_remove_group+0x44/0x90
        sysfs_remove_groups+0x2e/0x50
        dax_region_unregister+0x25/0x40 [dax]
        devm_action_release+0xf/0x20
        release_nodes+0x16d/0x2b0
        devres_release_all+0x3c/0x60
        device_release_driver_internal+0x17d/0x220
        device_release_driver+0x12/0x20
        unbind_store+0x112/0x160
      
      ndctl/2170 is trying to acquire the device_lock() to read an attribute,
      and ndctl/2186 is holding the device_lock() while trying to drain all
      active attribute readers.
      
      Thanks to Yi Zhang for the reproduction script.
      
      Fixes: d7fe1a67 ("dax: add region 'id', 'size', and 'align' attributes")
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55dbfd7d
    • Dan Williams's avatar
      device-dax: fix cdev leak · 89314777
      Dan Williams authored
      commit ed01e50a upstream.
      
      If device_add() fails, cleanup the cdev. Otherwise, we leak a kobj_map()
      with a stale device number.
      
      As Jason points out, there is a small possibility that userspace has
      opened and mapped the device in the time between cdev_add() and the
      device_add() failure. We need a new kill_dax_dev() helper to invalidate
      any established mappings.
      
      Fixes: ba09c01d ("dax: convert to the cdev api")
      Reported-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89314777
    • NeilBrown's avatar
      md/raid1: avoid reusing a resync bio after error handling. · 9bcb9cc9
      NeilBrown authored
      commit 0c9d5b12 upstream.
      
      fix_sync_read_error() modifies a bio on a newly faulty
      device by setting bi_end_io to end_sync_write.
      This ensure that put_buf() will still call rdev_dec_pending()
      as required, but makes sure that subsequent code in
      fix_sync_read_error() doesn't try to read from the device.
      
      Unfortunately this interacts badly with sync_request_write()
      which assumes that any bio with bi_end_io set to non-NULL
      other than end_sync_read is safe to write to.
      
      As the device is now faulty it doesn't make sense to write.
      As the bio was recently used for a read, it is "dirty"
      and not suitable for immediate submission.
      In particular, ->bi_next might be non-NULL, which will cause
      generic_make_request() to complain.
      
      Break this interaction by refusing to write to devices
      which are marked as Faulty.
      Reported-and-tested-by: default avatarMichael Wang <yun.wang@profitbricks.com>
      Fixes: 2e52d449 ("md/raid1: add failfast handling for reads.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bcb9cc9
    • Jason A. Donenfeld's avatar
      padata: free correct variable · 7d708085
      Jason A. Donenfeld authored
      commit 07a77929 upstream.
      
      The author meant to free the variable that was just allocated, instead
      of the one that failed to be allocated, but made a simple typo. This
      patch rectifies that.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d708085
    • Amir Goldstein's avatar
      ovl: do not set overlay.opaque on non-dir create · f33f17b2
      Amir Goldstein authored
      commit 4a99f3c8 upstream.
      
      The optimization for opaque dir create was wrongly being applied
      also to non-dir create.
      
      Fixes: 97c684cc ("ovl: create directories inside merged parent opaque")
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f33f17b2
    • Björn Jacke's avatar
      CIFS: add misssing SFM mapping for doublequote · 6a1baa3a
      Björn Jacke authored
      commit 85435d7a upstream.
      
      SFM is mapping doublequote to 0xF020
      
      Without this patch creating files with doublequote fails to Windows/Mac
      Signed-off-by: default avatarBjoern Jacke <bjacke@samba.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a1baa3a
    • David Disseldorp's avatar
      cifs: fix CIFS_IOC_GET_MNT_INFO oops · c4f35cc8
      David Disseldorp authored
      commit d8a6e505 upstream.
      
      An open directory may have a NULL private_data pointer prior to readdir.
      
      Fixes: 0de1f4c6 ("Add way to query server fs info for smb3")
      Signed-off-by: default avatarDavid Disseldorp <ddiss@suse.de>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4f35cc8
    • Rabin Vincent's avatar
      CIFS: fix oplock break deadlocks · 33fa0116
      Rabin Vincent authored
      commit 3998e6b8 upstream.
      
      When the final cifsFileInfo_put() is called from cifsiod and an oplock
      break work is queued, lockdep complains loudly:
      
       =============================================
       [ INFO: possible recursive locking detected ]
       4.11.0+ #21 Not tainted
       ---------------------------------------------
       kworker/0:2/78 is trying to acquire lock:
        ("cifsiod"){++++.+}, at: flush_work+0x215/0x350
      
       but task is already holding lock:
        ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock("cifsiod");
         lock("cifsiod");
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       2 locks held by kworker/0:2/78:
        #0:  ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
        #1:  ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0
      
       stack backtrace:
       CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
       Workqueue: cifsiod cifs_writev_complete
       Call Trace:
        dump_stack+0x85/0xc2
        __lock_acquire+0x17dd/0x2260
        ? match_held_lock+0x20/0x2b0
        ? trace_hardirqs_off_caller+0x86/0x130
        ? mark_lock+0xa6/0x920
        lock_acquire+0xcc/0x260
        ? lock_acquire+0xcc/0x260
        ? flush_work+0x215/0x350
        flush_work+0x236/0x350
        ? flush_work+0x215/0x350
        ? destroy_worker+0x170/0x170
        __cancel_work_timer+0x17d/0x210
        ? ___preempt_schedule+0x16/0x18
        cancel_work_sync+0x10/0x20
        cifsFileInfo_put+0x338/0x7f0
        cifs_writedata_release+0x2a/0x40
        ? cifs_writedata_release+0x2a/0x40
        cifs_writev_complete+0x29d/0x850
        ? preempt_count_sub+0x18/0xd0
        process_one_work+0x304/0x8e0
        worker_thread+0x9b/0x6a0
        kthread+0x1b2/0x200
        ? process_one_work+0x8e0/0x8e0
        ? kthread_create_on_node+0x40/0x40
        ret_from_fork+0x31/0x40
      
      This is a real warning.  Since the oplock is queued on the same
      workqueue this can deadlock if there is only one worker thread active
      for the workqueue (which will be the case during memory pressure when
      the rescuer thread is handling it).
      
      Furthermore, there is at least one other kind of hang possible due to
      the oplock break handling if there is only worker.  (This can be
      reproduced without introducing memory pressure by having passing 1 for
      the max_active parameter of cifsiod.) cifs_oplock_break() can wait
      indefintely in the filemap_fdatawait() while the cifs_writev_complete()
      work is blocked:
      
       sysrq: SysRq : Show Blocked State
         task                        PC stack   pid father
       kworker/0:1     D    0    16      2 0x00000000
       Workqueue: cifsiod cifs_oplock_break
       Call Trace:
        __schedule+0x562/0xf40
        ? mark_held_locks+0x4a/0xb0
        schedule+0x57/0xe0
        io_schedule+0x21/0x50
        wait_on_page_bit+0x143/0x190
        ? add_to_page_cache_lru+0x150/0x150
        __filemap_fdatawait_range+0x134/0x190
        ? do_writepages+0x51/0x70
        filemap_fdatawait_range+0x14/0x30
        filemap_fdatawait+0x3b/0x40
        cifs_oplock_break+0x651/0x710
        ? preempt_count_sub+0x18/0xd0
        process_one_work+0x304/0x8e0
        worker_thread+0x9b/0x6a0
        kthread+0x1b2/0x200
        ? process_one_work+0x8e0/0x8e0
        ? kthread_create_on_node+0x40/0x40
        ret_from_fork+0x31/0x40
       dd              D    0   683    171 0x00000000
       Call Trace:
        __schedule+0x562/0xf40
        ? mark_held_locks+0x29/0xb0
        schedule+0x57/0xe0
        io_schedule+0x21/0x50
        wait_on_page_bit+0x143/0x190
        ? add_to_page_cache_lru+0x150/0x150
        __filemap_fdatawait_range+0x134/0x190
        ? do_writepages+0x51/0x70
        filemap_fdatawait_range+0x14/0x30
        filemap_fdatawait+0x3b/0x40
        filemap_write_and_wait+0x4e/0x70
        cifs_flush+0x6a/0xb0
        filp_close+0x52/0xa0
        __close_fd+0xdc/0x150
        SyS_close+0x33/0x60
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      
       Showing all locks held in the system:
       2 locks held by kworker/0:1/16:
        #0:  ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
        #1:  ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
      
       Showing busy workqueues and worker pools:
       workqueue cifsiod: flags=0xc
         pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
           in-flight: 16:cifs_oplock_break
           delayed: cifs_writev_complete, cifs_echo_request
       pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3
      
      Fix these problems by creating a a new workqueue (with a rescuer) for
      the oplock break work.
      Signed-off-by: default avatarRabin Vincent <rabinv@axis.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33fa0116
    • David Disseldorp's avatar
      cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops · 076c404c
      David Disseldorp authored
      commit 6026685d upstream.
      
      As with 61876395, an open directory may have a NULL private_data
      pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this
      before dereference.
      
      Fixes: 834170c8 ("Enable previous version support")
      Signed-off-by: default avatarDavid Disseldorp <ddiss@suse.de>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      076c404c
    • David Disseldorp's avatar
      cifs: fix leak in FSCTL_ENUM_SNAPS response handling · 63d86cd9
      David Disseldorp authored
      commit 0e5c7955 upstream.
      
      The server may respond with success, and an output buffer less than
      sizeof(struct smb_snapshot_array) in length. Do not leak the output
      buffer in this case.
      
      Fixes: 834170c8 ("Enable previous version support")
      Signed-off-by: default avatarDavid Disseldorp <ddiss@suse.de>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63d86cd9