1. 21 Nov, 2018 24 commits
    • Amir Goldstein's avatar
      ovl: fix error handling in ovl_verify_set_fh() · 0afa17be
      Amir Goldstein authored
      commit babf4770 upstream.
      
      We hit a BUG on kfree of an ERR_PTR()...
      
      Reported-by: syzbot+ff03fe05c717b82502d0@syzkaller.appspotmail.com
      Fixes: 8b88a2e6 ("ovl: verify upper root dir matches lower root dir")
      Cc: <stable@vger.kernel.org> # v4.13
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0afa17be
    • Young_X's avatar
      cdrom: fix improper type cast, which can leat to information leak. · a8c254d8
      Young_X authored
      commit e4f3aa2e upstream.
      
      There is another cast from unsigned long to int which causes
      a bounds check to fail with specially crafted input. The value is
      then used as an index in the slot array in cdrom_slot_status().
      
      This issue is similar to CVE-2018-16658 and CVE-2018-10940.
      Signed-off-by: default avatarYoung_X <YangX92@hotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8c254d8
    • Dominique Martinet's avatar
      9p: clear dangling pointers in p9stat_free · 18280c12
      Dominique Martinet authored
      [ Upstream commit 62e39417 ]
      
      p9stat_free is more of a cleanup function than a 'free' function as it
      only frees the content of the struct; there are chances of use-after-free
      if it is improperly used (e.g. p9stat_free called twice as it used to be
      possible to)
      
      Clearing dangling pointers makes the function idempotent and safer to use.
      
      Link: http://lkml.kernel.org/r/1535410108-20650-2-git-send-email-asmadeus@codewreck.orgSigned-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Reported-by: syzbot+d4252148d198410b864f@syzkaller.appspotmail.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18280c12
    • Dominique Martinet's avatar
      9p locks: fix glock.client_id leak in do_lock · 491fc097
      Dominique Martinet authored
      [ Upstream commit b4dc44b3 ]
      
      the 9p client code overwrites our glock.client_id pointing to a static
      buffer by an allocated string holding the network provided value which
      we do not care about; free and reset the value as appropriate.
      
      This is almost identical to the leak in v9fs_file_getlock() fixed by
      Al Viro in commit ce85dd58 ("9p: we are leaking glock.client_id
      in v9fs_file_getlock()"), which was returned as an error by a coverity
      false positive -- while we are here attempt to make the code slightly
      more robust to future change of the net/9p/client code and hopefully
      more clear to coverity that there is no problem.
      
      Link: http://lkml.kernel.org/r/1536339057-21974-5-git-send-email-asmadeus@codewreck.orgSigned-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      491fc097
    • Alexandru Ardelean's avatar
      staging:iio:ad7606: fix voltage scales · 442b5429
      Alexandru Ardelean authored
      [ Upstream commit 4ee03330 ]
      
      Fixes commit 17be2a29 ("staging: iio:
      ad7606: replace range/range_available with corresponding scale").
      
      The AD7606 devices don't have a 2.5V voltage range, they have 5V & 10V
      voltage range, which is selectable via the `gpio_range` descriptor.
      
      The scales also seem to have been miscomputed, because when they were
      applied to the raw values, the results differ from the expected values.
      After checking the ADC transfer function in the datasheet, these were
      re-computed.
      Signed-off-by: default avatarAlexandru Ardelean <alexandru.ardelean@analog.com>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      442b5429
    • Breno Leitao's avatar
      powerpc/selftests: Wait all threads to join · d3835bb8
      Breno Leitao authored
      [ Upstream commit 693b31b2 ]
      
      Test tm-tmspr might exit before all threads stop executing, because it just
      waits for the very last thread to join before proceeding/exiting.
      
      This patch makes sure that all threads that were created will join before
      proceeding/exiting.
      
      This patch also guarantees that the amount of threads being created is equal
      to thread_num.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3835bb8
    • Marco Felsch's avatar
      media: tvp5150: fix width alignment during set_selection() · 3e59ed2e
      Marco Felsch authored
      [ Upstream commit bd24db04 ]
      
      The driver ignored the width alignment which exists due to the UYVY
      colorspace format. Fix the width alignment and make use of the the
      provided v4l2 helper function to set the width, height and all
      alignments in one.
      
      Fixes: 963ddc63 ("[media] media: tvp5150: Add cropping support")
      Signed-off-by: default avatarMarco Felsch <m.felsch@pengutronix.de>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e59ed2e
    • Phil Elwell's avatar
      sc16is7xx: Fix for multi-channel stall · 4d0df50d
      Phil Elwell authored
      [ Upstream commit 83444987 ]
      
      The SC16IS752 is a dual-channel device. The two channels are largely
      independent, but the IRQ signals are wired together as an open-drain,
      active low signal which will be driven low while either of the
      channels requires attention, which can be for significant periods of
      time until operations complete and the interrupt can be acknowledged.
      In that respect it is should be treated as a true level-sensitive IRQ.
      
      The kernel, however, needs to be able to exit interrupt context in
      order to use I2C or SPI to access the device registers (which may
      involve sleeping).  Therefore the interrupt needs to be masked out or
      paused in some way.
      
      The usual way to manage sleeping from within an interrupt handler
      is to use a threaded interrupt handler - a regular interrupt routine
      does the minimum amount of work needed to triage the interrupt before
      waking the interrupt service thread. If the threaded IRQ is marked as
      IRQF_ONESHOT the kernel will automatically mask out the interrupt
      until the thread runs to completion. The sc16is7xx driver used to
      use a threaded IRQ, but a patch switched to using a kthread_worker
      in order to set realtime priorities on the handler thread and for
      other optimisations. The end result is non-threaded IRQ that
      schedules some work then returns IRQ_HANDLED, making the kernel
      think that all IRQ processing has completed.
      
      The work-around to prevent a constant stream of interrupts is to
      mark the interrupt as edge-sensitive rather than level-sensitive,
      but interpreting an active-low source as a falling-edge source
      requires care to prevent a total cessation of interrupts. Whereas
      an edge-triggering source will generate a new edge for every interrupt
      condition a level-triggering source will keep the signal at the
      interrupting level until it no longer requires attention; in other
      words, the host won't see another edge until all interrupt conditions
      are cleared. It is therefore vital that the interrupt handler does not
      exit with an outstanding interrupt condition, otherwise the kernel
      will not receive another interrupt unless some other operation causes
      the interrupt state on the device to be cleared.
      
      The existing sc16is7xx driver has a very simple interrupt "thread"
      (kthread_work job) that processes interrupts on each channel in turn
      until there are no more. If both channels are active and the first
      channel starts interrupting while the handler for the second channel
      is running then it will not be detected and an IRQ stall ensues. This
      could be handled easily if there was a shared IRQ status register, or
      a convenient way to determine if the IRQ had been deasserted for any
      length of time, but both appear to be lacking.
      
      Avoid this problem (or at least make it much less likely to happen)
      by reducing the granularity of per-channel interrupt processing
      to one condition per iteration, only exiting the overall loop when
      both channels are no longer interrupting.
      Signed-off-by: default avatarPhil Elwell <phil@raspberrypi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d0df50d
    • Huacai Chen's avatar
      MIPS/PCI: Call pcie_bus_configure_settings() to set MPS/MRRS · f9cb913a
      Huacai Chen authored
      [ Upstream commit 2794f688 ]
      
      Call pcie_bus_configure_settings() on MIPS, like for other platforms.
      The function pcie_bus_configure_settings() makes sure the MPS (Max
      Payload Size) across the bus is uniform and provides the ability to
      tune the MRSS (Max Read Request Size) and MPS (Max Payload Size) to
      higher performance values. Some devices will not operate properly if
      these aren't set correctly because the firmware doesn't always do it.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Patchwork: https://patchwork.linux-mips.org/patch/20649/
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: Huacai Chen <chenhuacai@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9cb913a
    • Rashmica Gupta's avatar
      powerpc/memtrace: Remove memory in chunks · 98372139
      Rashmica Gupta authored
      [ Upstream commit 3f7daf3d ]
      
      When hot-removing memory release_mem_region_adjustable() splits iomem
      resources if they are not the exact size of the memory being
      hot-deleted. Adding this memory back to the kernel adds a new resource.
      
      Eg a node has memory 0x0 - 0xfffffffff. Hot-removing 1GB from
      0xf40000000 results in the single resource 0x0-0xfffffffff being split
      into two resources: 0x0-0xf3fffffff and 0xf80000000-0xfffffffff.
      
      When we hot-add the memory back we now have three resources:
      0x0-0xf3fffffff, 0xf40000000-0xf7fffffff, and 0xf80000000-0xfffffffff.
      
      This is an issue if we try to remove some memory that overlaps
      resources. Eg when trying to remove 2GB at address 0xf40000000,
      release_mem_region_adjustable() fails as it expects the chunk of memory
      to be within the boundaries of a single resource. We then get the
      warning: "Unable to release resource" and attempting to use memtrace
      again gives us this error: "bash: echo: write error: Resource
      temporarily unavailable"
      
      This patch makes memtrace remove memory in chunks that are always the
      same size from an address that is always equal to end_of_memory -
      n*size, for some n. So hotremoving and hotadding memory of different
      sizes will now not attempt to remove memory that spans multiple
      resources.
      Signed-off-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98372139
    • Joel Stanley's avatar
      powerpc/boot: Ensure _zimage_start is a weak symbol · f386e1e5
      Joel Stanley authored
      [ Upstream commit ee9d21b3 ]
      
      When building with clang crt0's _zimage_start is not marked weak, which
      breaks the build when linking the kernel image:
      
       $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
       0000000000000058 g       .text  0000000000000000 _zimage_start
      
       ld: arch/powerpc/boot/wrapper.a(crt0.o): in function '_zimage_start':
       (.text+0x58): multiple definition of '_zimage_start';
       arch/powerpc/boot/pseries-head.o:(.text+0x0): first defined here
      
      Clang requires the .weak directive to appear after the symbol is
      declared. The binutils manual says:
      
       This directive sets the weak attribute on the comma separated list of
       symbol names. If the symbols do not already exist, they will be
       created.
      
      So it appears this is different with clang. The only reference I could
      see for this was an OpenBSD mailing list post[1].
      
      Changing it to be after the declaration fixes building with Clang, and
      still works with GCC.
      
       $ objdump -t arch/powerpc/boot/crt0.o |grep _zimage_start$
       0000000000000058  w      .text	0000000000000000 _zimage_start
      
      Reported to clang as https://bugs.llvm.org/show_bug.cgi?id=38921
      
      [1] https://groups.google.com/forum/#!topic/fa.openbsd.tech/PAgKKen2YCYSigned-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f386e1e5
    • Dengcheng Zhu's avatar
      MIPS: kexec: Mark CPU offline before disabling local IRQ · 806be82c
      Dengcheng Zhu authored
      [ Upstream commit dc57aaf9 ]
      
      After changing CPU online status, it will not be sent any IPIs such as in
      __flush_cache_all() on software coherency systems. Do this before disabling
      local IRQ.
      Signed-off-by: default avatarDengcheng Zhu <dzhu@wavecomp.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Patchwork: https://patchwork.linux-mips.org/patch/20571/
      Cc: pburton@wavecomp.com
      Cc: ralf@linux-mips.org
      Cc: linux-mips@linux-mips.org
      Cc: rachel.mozes@intel.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      806be82c
    • Lucas Stach's avatar
      media: coda: don't overwrite h.264 profile_idc on decoder instance · e56c482e
      Lucas Stach authored
      [ Upstream commit 1f32061e ]
      
      On a decoder instance, after the profile has been parsed from the stream
      __v4l2_ctrl_s_ctrl() is called to notify userspace about changes in the
      read-only profile control. This ends up calling back into the CODA driver
      where a missing check on the s_ctrl caused the profile information that has
      just been parsed from the stream to be overwritten with the default
      baseline profile.
      
      Later on the driver fails to enable frame reordering, based on the wrong
      profile information.
      
      Fixes: 347de126d1da (media: coda: add read-only h.264 decoder
                           profile/level controls)
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Reviewed-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e56c482e
    • Nicholas Mc Guire's avatar
      media: pci: cx23885: handle adding to list failure · 27f61294
      Nicholas Mc Guire authored
      [ Upstream commit c5d59528 ]
      
      altera_hw_filt_init() which calls append_internal() assumes
      that the node was successfully linked in while in fact it can
      silently fail. So the call-site needs to set return to -ENOMEM
      on append_internal() returning NULL and exit through the err path.
      
      Fixes: 349bcf02 ("[media] Altera FPGA based CI driver module")
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27f61294
    • John Garry's avatar
      drm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer pointer · 18490ea7
      John Garry authored
      [ Upstream commit 331d880b ]
      
      In hibmc_drm_fb_create(), when the call to hibmc_framebuffer_init() fails
      with error, do not store the error code in the HiBMC device frame-buffer
      pointer, as this will be later checked for non-zero value in
      hibmc_fbdev_destroy() when our intention is to check for a valid function
      pointer.
      
      This fixes the following crash:
      [    9.699791] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001a
      [    9.708672] Mem abort info:
      [    9.711489]   ESR = 0x96000004
      [    9.714570]   Exception class = DABT (current EL), IL = 32 bits
      [    9.720551]   SET = 0, FnV = 0
      [    9.723631]   EA = 0, S1PTW = 0
      [    9.726799] Data abort info:
      [    9.729702]   ISV = 0, ISS = 0x00000004
      [    9.733573]   CM = 0, WnR = 0
      [    9.736566] [000000000000001a] user address but active_mm is swapper
      [    9.742987] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [    9.748614] Modules linked in:
      [    9.751694] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: G        W         4.19.0-rc4-next-20180920-00001-g9b0012c #322
      [    9.762681] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 Nemo 2.0 RC0 04/18/2018
      [    9.771915] Workqueue: events work_for_cpu_fn
      [    9.776312] pstate: 60000005 (nZCv daif -PAN -UAO)
      [    9.781150] pc : drm_mode_object_put+0x0/0x20
      [    9.785547] lr : hibmc_fbdev_fini+0x40/0x58
      [    9.789767] sp : ffff00000af1bcf0
      [    9.793108] x29: ffff00000af1bcf0 x28: 0000000000000000
      [    9.798473] x27: 0000000000000000 x26: ffff000008f66630
      [    9.803838] x25: 0000000000000000 x24: ffff0000095abb98
      [    9.809203] x23: ffff8017db92fe00 x22: ffff8017d2b13000
      [    9.814568] x21: ffffffffffffffea x20: ffff8017d2f80018
      [    9.819933] x19: ffff8017d28a0018 x18: ffffffffffffffff
      [    9.825297] x17: 0000000000000000 x16: 0000000000000000
      [    9.830662] x15: ffff0000092296c8 x14: ffff00008939970f
      [    9.836026] x13: ffff00000939971d x12: ffff000009229940
      [    9.841391] x11: ffff0000085f8fc0 x10: ffff00000af1b9a0
      [    9.846756] x9 : 000000000000000d x8 : 6620657a696c6169
      [    9.852121] x7 : ffff8017d3340580 x6 : ffff8017d4168000
      [    9.857486] x5 : 0000000000000000 x4 : ffff8017db92fb20
      [    9.862850] x3 : 0000000000002690 x2 : ffff8017d3340480
      [    9.868214] x1 : 0000000000000028 x0 : 0000000000000002
      [    9.873580] Process kworker/16:1 (pid: 293, stack limit = 0x(____ptrval____))
      [    9.880788] Call trace:
      [    9.883252]  drm_mode_object_put+0x0/0x20
      [    9.887297]  hibmc_unload+0x1c/0x80
      [    9.890815]  hibmc_pci_probe+0x170/0x3c8
      [    9.894773]  local_pci_probe+0x3c/0xb0
      [    9.898555]  work_for_cpu_fn+0x18/0x28
      [    9.902337]  process_one_work+0x1e0/0x318
      [    9.906382]  worker_thread+0x228/0x450
      [    9.910164]  kthread+0x128/0x130
      [    9.913418]  ret_from_fork+0x10/0x18
      [    9.917024] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
      [    9.923180] ---[ end trace 2695ffa0af5be375 ]---
      
      Fixes: d1667b86 ("drm/hisilicon/hibmc: Add support for frame buffer")
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Reviewed-by: default avatarXinliang Liu <z.liuxinliang@hisilicon.com>
      Signed-off-by: default avatarXinliang Liu <z.liuxinliang@hisilicon.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18490ea7
    • Tomi Valkeinen's avatar
      drm/omap: fix memory barrier bug in DMM driver · 547d528e
      Tomi Valkeinen authored
      [ Upstream commit 538f66ba ]
      
      A DMM timeout "timed out waiting for done" has been observed on DRA7
      devices. The timeout happens rarely, and only when the system is under
      heavy load.
      
      Debugging showed that the timeout can be made to happen much more
      frequently by optimizing the DMM driver, so that there's almost no code
      between writing the last DMM descriptors to RAM, and writing to DMM
      register which starts the DMM transaction.
      
      The current theory is that a wmb() does not properly ensure that the
      data written to RAM is observable by all the components in the system.
      
      This DMM timeout has caused interesting (and rare) bugs as the error
      handling was not functioning properly (the error handling has been fixed
      in previous commits):
      
       * If a DMM timeout happened when a GEM buffer was being pinned for
         display on the screen, a timeout error would be shown, but the driver
         would continue programming DSS HW with broken buffer, leading to
         SYNCLOST floods and possible crashes.
      
       * If a DMM timeout happened when other user (say, video decoder) was
         pinning a GEM buffer, a timeout would be shown but if the user
         handled the error properly, no other issues followed.
      
       * If a DMM timeout happened when a GEM buffer was being released, the
         driver does not even notice the error, leading to crashes or hang
         later.
      
      This patch adds wmb() and readl() calls after the last bit is written to
      RAM, which should ensure that the execution proceeds only after the data
      is actually in RAM, and thus observable by DMM.
      
      The read-back should not be needed. Further study is required to understand
      if DMM is somehow special case and read-back is ok, or if DRA7's memory
      barriers do not work correctly.
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      547d528e
    • Christophe Leroy's avatar
      powerpc/mm: Don't report hugepage tables as memory leaks when using kmemleak · 4cb59243
      Christophe Leroy authored
      [ Upstream commit 803d690e ]
      
      When a process allocates a hugepage, the following leak is
      reported by kmemleak. This is a false positive which is
      due to the pointer to the table being stored in the PGD
      as physical memory address and not virtual memory pointer.
      
      unreferenced object 0xc30f8200 (size 512):
        comm "mmap", pid 374, jiffies 4872494 (age 627.630s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<e32b68da>] huge_pte_alloc+0xdc/0x1f8
          [<9e0df1e1>] hugetlb_fault+0x560/0x8f8
          [<7938ec6c>] follow_hugetlb_page+0x14c/0x44c
          [<afbdb405>] __get_user_pages+0x1c4/0x3dc
          [<b8fd7cd9>] __mm_populate+0xac/0x140
          [<3215421e>] vm_mmap_pgoff+0xb4/0xb8
          [<c148db69>] ksys_mmap_pgoff+0xcc/0x1fc
          [<4fcd760f>] ret_from_syscall+0x0/0x38
      
      See commit a984506c ("powerpc/mm: Don't report PUDs as
      memory leaks when using kmemleak") for detailed explanation.
      
      To fix that, this patch tells kmemleak to ignore the allocated
      hugepage table.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4cb59243
    • Daniel Axtens's avatar
      powerpc/nohash: fix undefined behaviour when testing page size support · 3374b0b1
      Daniel Axtens authored
      [ Upstream commit f5e28480 ]
      
      When enumerating page size definitions to check hardware support,
      we construct a constant which is (1U << (def->shift - 10)).
      
      However, the array of page size definitions is only initalised for
      various MMU_PAGE_* constants, so it contains a number of 0-initialised
      elements with def->shift == 0. This means we end up shifting by a
      very large number, which gives the following UBSan splat:
      
      ================================================================================
      UBSAN: Undefined behaviour in /home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21
      shift exponent 4294967286 is too large for 32-bit type 'unsigned int'
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6
      Call Trace:
      [c00000000101bc20] [c000000000a13d54] .dump_stack+0xa8/0xec (unreliable)
      [c00000000101bcb0] [c0000000004f20a8] .ubsan_epilogue+0x18/0x64
      [c00000000101bd30] [c0000000004f2b10] .__ubsan_handle_shift_out_of_bounds+0x110/0x1a4
      [c00000000101be20] [c000000000d21760] .early_init_mmu+0x1b4/0x5a0
      [c00000000101bf10] [c000000000d1ba28] .early_setup+0x100/0x130
      [c00000000101bf90] [c000000000000528] start_here_multiplatform+0x68/0x80
      ================================================================================
      
      Fix this by first checking if the element exists (shift != 0) before
      constructing the constant.
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3374b0b1
    • Fabio Estevam's avatar
      ARM: imx_v6_v7_defconfig: Select CONFIG_TMPFS_POSIX_ACL · 6c2449a0
      Fabio Estevam authored
      [ Upstream commit 35d3cbe8 ]
      
      Andreas Müller reports:
      
      "Fixes:
      
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[220]: Failed to apply ACL on /dev/v4l-subdev0: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[224]: Failed to apply ACL on /dev/v4l-subdev1: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[215]: Failed to apply ACL on /dev/v4l-subdev10: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[228]: Failed to apply ACL on /dev/v4l-subdev2: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[232]: Failed to apply ACL on /dev/v4l-subdev5: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[217]: Failed to apply ACL on /dev/v4l-subdev11: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[214]: Failed to apply ACL on /dev/dri/card1: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[216]: Failed to apply ACL on /dev/v4l-subdev8: Operation not supported
      | Sep 04 09:05:10 imx6qdl-variscite-som systemd-udevd[226]: Failed to apply ACL on /dev/v4l-subdev9: Operation not supported
      
      and nasty follow-ups: Starting weston from sddm as unpriviledged user fails
      with some hints on missing access rights."
      
      Select the CONFIG_TMPFS_POSIX_ACL option to fix these issues.
      Reported-by: default avatarAndreas Müller <schnitzeltony@gmail.com>
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Acked-by: default avatarOtavio Salvador <otavio@ossystems.com.br>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c2449a0
    • Miles Chen's avatar
      tty: check name length in tty_find_polling_driver() · 8c5800cd
      Miles Chen authored
      [ Upstream commit 33a1a7be ]
      
      The issue is found by a fuzzing test.
      If tty_find_polling_driver() recevies an incorrect input such as
      ',,' or '0b', the len becomes 0 and strncmp() always return 0.
      In this case, a null p->ops->poll_init() is called and it causes a kernel
      panic.
      
      Fix this by checking name length against zero in tty_find_polling_driver().
      
      $echo ,, > /sys/module/kgdboc/parameters/kgdboc
      [   20.804451] WARNING: CPU: 1 PID: 104 at drivers/tty/serial/serial_core.c:457
      uart_get_baud_rate+0xe8/0x190
      [   20.804917] Modules linked in:
      [   20.805317] CPU: 1 PID: 104 Comm: sh Not tainted 4.19.0-rc7ajb #8
      [   20.805469] Hardware name: linux,dummy-virt (DT)
      [   20.805732] pstate: 20000005 (nzCv daif -PAN -UAO)
      [   20.805895] pc : uart_get_baud_rate+0xe8/0x190
      [   20.806042] lr : uart_get_baud_rate+0xc0/0x190
      [   20.806476] sp : ffffffc06acff940
      [   20.806676] x29: ffffffc06acff940 x28: 0000000000002580
      [   20.806977] x27: 0000000000009600 x26: 0000000000009600
      [   20.807231] x25: ffffffc06acffad0 x24: 00000000ffffeff0
      [   20.807576] x23: 0000000000000001 x22: 0000000000000000
      [   20.807807] x21: 0000000000000001 x20: 0000000000000000
      [   20.808049] x19: ffffffc06acffac8 x18: 0000000000000000
      [   20.808277] x17: 0000000000000000 x16: 0000000000000000
      [   20.808520] x15: ffffffffffffffff x14: ffffffff00000000
      [   20.808757] x13: ffffffffffffffff x12: 0000000000000001
      [   20.809011] x11: 0101010101010101 x10: ffffff880d59ff5f
      [   20.809292] x9 : ffffff880d59ff5e x8 : ffffffc06acffaf3
      [   20.809549] x7 : 0000000000000000 x6 : ffffff880d59ff5f
      [   20.809803] x5 : 0000000080008001 x4 : 0000000000000003
      [   20.810056] x3 : ffffff900853e6b4 x2 : dfffff9000000000
      [   20.810693] x1 : ffffffc06acffad0 x0 : 0000000000000cb0
      [   20.811005] Call trace:
      [   20.811214]  uart_get_baud_rate+0xe8/0x190
      [   20.811479]  serial8250_do_set_termios+0xe0/0x6f4
      [   20.811719]  serial8250_set_termios+0x48/0x54
      [   20.811928]  uart_set_options+0x138/0x1bc
      [   20.812129]  uart_poll_init+0x114/0x16c
      [   20.812330]  tty_find_polling_driver+0x158/0x200
      [   20.812545]  configure_kgdboc+0xbc/0x1bc
      [   20.812745]  param_set_kgdboc_var+0xb8/0x150
      [   20.812960]  param_attr_store+0xbc/0x150
      [   20.813160]  module_attr_store+0x40/0x58
      [   20.813364]  sysfs_kf_write+0x8c/0xa8
      [   20.813563]  kernfs_fop_write+0x154/0x290
      [   20.813764]  vfs_write+0xf0/0x278
      [   20.813951]  __arm64_sys_write+0x84/0xf4
      [   20.814400]  el0_svc_common+0xf4/0x1dc
      [   20.814616]  el0_svc_handler+0x98/0xbc
      [   20.814804]  el0_svc+0x8/0xc
      [   20.822005] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [   20.826913] Mem abort info:
      [   20.827103]   ESR = 0x84000006
      [   20.827352]   Exception class = IABT (current EL), IL = 16 bits
      [   20.827655]   SET = 0, FnV = 0
      [   20.827855]   EA = 0, S1PTW = 0
      [   20.828135] user pgtable: 4k pages, 39-bit VAs, pgdp = (____ptrval____)
      [   20.828484] [0000000000000000] pgd=00000000aadee003, pud=00000000aadee003, pmd=0000000000000000
      [   20.829195] Internal error: Oops: 84000006 [#1] SMP
      [   20.829564] Modules linked in:
      [   20.829890] CPU: 1 PID: 104 Comm: sh Tainted: G        W         4.19.0-rc7ajb #8
      [   20.830545] Hardware name: linux,dummy-virt (DT)
      [   20.830829] pstate: 60000085 (nZCv daIf -PAN -UAO)
      [   20.831174] pc :           (null)
      [   20.831457] lr : serial8250_do_set_termios+0x358/0x6f4
      [   20.831727] sp : ffffffc06acff9b0
      [   20.831936] x29: ffffffc06acff9b0 x28: ffffff9008d7c000
      [   20.832267] x27: ffffff900969e16f x26: 0000000000000000
      [   20.832589] x25: ffffff900969dfb0 x24: 0000000000000000
      [   20.832906] x23: ffffffc06acffad0 x22: ffffff900969e160
      [   20.833232] x21: 0000000000000000 x20: ffffffc06acffac8
      [   20.833559] x19: ffffff900969df90 x18: 0000000000000000
      [   20.833878] x17: 0000000000000000 x16: 0000000000000000
      [   20.834491] x15: ffffffffffffffff x14: ffffffff00000000
      [   20.834821] x13: ffffffffffffffff x12: 0000000000000001
      [   20.835143] x11: 0101010101010101 x10: ffffff880d59ff5f
      [   20.835467] x9 : ffffff880d59ff5e x8 : ffffffc06acffaf3
      [   20.835790] x7 : 0000000000000000 x6 : ffffff880d59ff5f
      [   20.836111] x5 : c06419717c314100 x4 : 0000000000000007
      [   20.836419] x3 : 0000000000000000 x2 : 0000000000000000
      [   20.836732] x1 : 0000000000000001 x0 : ffffff900969df90
      [   20.837100] Process sh (pid: 104, stack limit = 0x(____ptrval____))
      [   20.837396] Call trace:
      [   20.837566]            (null)
      [   20.837816]  serial8250_set_termios+0x48/0x54
      [   20.838089]  uart_set_options+0x138/0x1bc
      [   20.838570]  uart_poll_init+0x114/0x16c
      [   20.838834]  tty_find_polling_driver+0x158/0x200
      [   20.839119]  configure_kgdboc+0xbc/0x1bc
      [   20.839380]  param_set_kgdboc_var+0xb8/0x150
      [   20.839658]  param_attr_store+0xbc/0x150
      [   20.839920]  module_attr_store+0x40/0x58
      [   20.840183]  sysfs_kf_write+0x8c/0xa8
      [   20.840183]  sysfs_kf_write+0x8c/0xa8
      [   20.840440]  kernfs_fop_write+0x154/0x290
      [   20.840702]  vfs_write+0xf0/0x278
      [   20.840942]  __arm64_sys_write+0x84/0xf4
      [   20.841209]  el0_svc_common+0xf4/0x1dc
      [   20.841471]  el0_svc_handler+0x98/0xbc
      [   20.841713]  el0_svc+0x8/0xc
      [   20.842057] Code: bad PC value
      [   20.842764] ---[ end trace a8835d7de79aaadf ]---
      [   20.843134] Kernel panic - not syncing: Fatal exception
      [   20.843515] SMP: stopping secondary CPUs
      [   20.844289] Kernel Offset: disabled
      [   20.844634] CPU features: 0x0,21806002
      [   20.844857] Memory Limit: none
      [   20.845172] ---[ end Kernel panic - not syncing: Fatal exception ]---
      Signed-off-by: default avatarMiles Chen <miles.chen@mediatek.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c5800cd
    • Sam Bobroff's avatar
      powerpc/eeh: Fix possible null deref in eeh_dump_dev_log() · 211981e7
      Sam Bobroff authored
      [ Upstream commit f9bc28ae ]
      
      If an error occurs during an unplug operation, it's possible for
      eeh_dump_dev_log() to be called when edev->pdn is null, which
      currently leads to dereferencing a null pointer.
      
      Handle this by skipping the error log for those devices.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      211981e7
    • Michael Ellerman's avatar
      powerpc/mm: Fix page table dump to work on Radix · 0b1f1204
      Michael Ellerman authored
      [ Upstream commit 0d923962 ]
      
      When we're running on Book3S with the Radix MMU enabled the page table
      dump currently prints the wrong addresses because it uses the wrong
      start address.
      
      Fix it to use PAGE_OFFSET rather than KERN_VIRT_START.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b1f1204
    • Nicholas Piggin's avatar
      powerpc/64/module: REL32 relocation range check · 2b9aed7c
      Nicholas Piggin authored
      [ Upstream commit b851ba02 ]
      
      The recent module relocation overflow crash demonstrated that we
      have no range checking on REL32 relative relocations. This patch
      implements a basic check, the same kernel that previously oopsed
      and rebooted now continues with some of these errors when loading
      the module:
      
        module_64: x_tables: REL32 527703503449812 out of range!
      
      Possibly other relocations (ADDR32, REL16, TOC16, etc.) should also have
      overflow checks.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b9aed7c
    • Christophe Leroy's avatar
      powerpc/traps: restore recoverability of machine_check interrupts · 90c68f71
      Christophe Leroy authored
      [ Upstream commit daf00ae7 ]
      
      commit b96672dd ("powerpc: Machine check interrupt is a non-
      maskable interrupt") added a call to nmi_enter() at the beginning of
      machine check restart exception handler. Due to that, in_interrupt()
      always returns true regardless of the state before entering the
      exception, and die() panics even when the system was not already in
      interrupt.
      
      This patch calls nmi_exit() before calling die() in order to restore
      the interrupt state we had before calling nmi_enter()
      
      Fixes: b96672dd ("powerpc: Machine check interrupt is a non-maskable interrupt")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90c68f71
  2. 13 Nov, 2018 16 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.81 · 2e390c48
      Greg Kroah-Hartman authored
      2e390c48
    • Shaohua Li's avatar
      MD: fix invalid stored role for a disk - try2 · c3cd0a4f
      Shaohua Li authored
      commit 9e753ba9 upstream.
      
      Commit d595567d (MD: fix invalid stored role for a disk) broke linear
      hotadd. Let's only fix the role for disks in raid1/10.
      Based on Guoqing's original patch.
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
      Cc: Guoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3cd0a4f
    • Daniel Colascione's avatar
      bpf: wait for running BPF programs when updating map-in-map · 4bea15f7
      Daniel Colascione authored
      commit 1ae80cf3 upstream.
      
      The map-in-map frequently serves as a mechanism for atomic
      snapshotting of state that a BPF program might record.  The current
      implementation is dangerous to use in this way, however, since
      userspace has no way of knowing when all programs that might have
      retrieved the "old" value of the map may have completed.
      
      This change ensures that map update operations on map-in-map map types
      always wait for all references to the old map to drop before returning
      to userspace.
      Signed-off-by: default avatarDaniel Colascione <dancol@google.com>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [fengc@google.com: 4.14 backport: adjust context]
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bea15f7
    • David Ahern's avatar
      net: sched: Remove TCA_OPTIONS from policy · dfba4092
      David Ahern authored
      commit e72bde6b upstream.
      
      Marco reported an error with hfsc:
      root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
      Error: Attribute failed policy validation.
      
      Apparently a few implementations pass TCA_OPTIONS as a binary instead
      of nested attribute, so drop TCA_OPTIONS from the policy.
      
      Fixes: 8b4c3cdd ("net: sched: Add policy validation for tc attributes")
      Reported-by: default avatarMarco Berizzi <pupilla@libero.it>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfba4092
    • Filipe Manana's avatar
      Btrfs: fix fsync after hole punching when using no-holes feature · fc8a236e
      Filipe Manana authored
      commit 4ee3fad3 upstream.
      
      When we have the no-holes mode enabled and fsync a file after punching a
      hole in it, we can end up not logging the whole hole range in the log tree.
      This happens if the file has extent items that span more than one leaf and
      we punch a hole that covers a range that starts in a leaf but does not go
      beyond the offset of the first extent in the next leaf.
      
      Example:
      
        $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
        $ mount /dev/sdb /mnt
        $ for ((i = 0; i <= 831; i++)); do
      	offset=$((i * 2 * 256 * 1024))
      	xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \
      		/mnt/foobar >/dev/null
          done
        $ sync
      
        # We now have 2 leafs in our filesystem fs tree, the first leaf has an
        # item corresponding the extent at file offset 216530944 and the second
        # leaf has a first item corresponding to the extent at offset 217055232.
        # Now we punch a hole that partially covers the range of the extent at
        # offset 216530944 but does go beyond the offset 217055232.
      
        $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
        $ xfs_io -c "fsync" /mnt/foobar
      
        <power fail>
      
        # mount to replay the log
        $ mount /dev/sdb /mnt
      
        # Before this patch, only the subrange [216658016, 216662016[ (length of
        # 4000 bytes) was logged, leaving an incorrect file layout after log
        # replay.
      
      Fix this by checking if there is a hole between the last extent item that
      we processed and the first extent item in the next leaf, and if there is
      one, log an explicit hole extent item.
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc8a236e
    • Filipe Manana's avatar
      Btrfs: fix use-after-free when dumping free space · 3c2d1ef3
      Filipe Manana authored
      commit 9084cb6a upstream.
      
      We were iterating a block group's free space cache rbtree without locking
      first the lock that protects it (the free_space_ctl->free_space_offset
      rbtree is protected by the free_space_ctl->tree_lock spinlock).
      
      KASAN reported an use-after-free problem when iterating such a rbtree due
      to a concurrent rbtree delete:
      
      [ 9520.359168] ==================================================================
      [ 9520.359656] BUG: KASAN: use-after-free in rb_next+0x13/0x90
      [ 9520.359949] Read of size 8 at addr ffff8800b7ada500 by task btrfs-transacti/1721
      [ 9520.360357]
      [ 9520.360530] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G             L    4.19.0-rc8-nbor #555
      [ 9520.360990] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [ 9520.362682] Call Trace:
      [ 9520.362887]  dump_stack+0xa4/0xf5
      [ 9520.363146]  print_address_description+0x78/0x280
      [ 9520.363412]  kasan_report+0x263/0x390
      [ 9520.363650]  ? rb_next+0x13/0x90
      [ 9520.363873]  __asan_load8+0x54/0x90
      [ 9520.364102]  rb_next+0x13/0x90
      [ 9520.364380]  btrfs_dump_free_space+0x146/0x160 [btrfs]
      [ 9520.364697]  dump_space_info+0x2cd/0x310 [btrfs]
      [ 9520.364997]  btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
      [ 9520.365310]  __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
      [ 9520.365646]  ? btrfs_update_time+0x180/0x180 [btrfs]
      [ 9520.365923]  ? _raw_spin_unlock+0x27/0x40
      [ 9520.366204]  ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
      [ 9520.366549]  btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
      [ 9520.366880]  cache_save_setup+0x42e/0x580 [btrfs]
      [ 9520.367220]  ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
      [ 9520.367518]  ? lock_downgrade+0x2f0/0x2f0
      [ 9520.367799]  ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
      [ 9520.368104]  ? kasan_check_read+0x11/0x20
      [ 9520.368349]  ? do_raw_spin_unlock+0xa8/0x140
      [ 9520.368638]  btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
      [ 9520.368978]  ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
      [ 9520.369282]  ? do_raw_spin_unlock+0xa8/0x140
      [ 9520.369534]  ? _raw_spin_unlock+0x27/0x40
      [ 9520.369811]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
      [ 9520.370137]  commit_cowonly_roots+0x4b9/0x610 [btrfs]
      [ 9520.370560]  ? commit_fs_roots+0x350/0x350 [btrfs]
      [ 9520.370926]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
      [ 9520.371285]  btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
      [ 9520.371612]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
      [ 9520.371943]  ? start_transaction+0x168/0x6c0 [btrfs]
      [ 9520.372257]  transaction_kthread+0x21c/0x240 [btrfs]
      [ 9520.372537]  kthread+0x1d2/0x1f0
      [ 9520.372793]  ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
      [ 9520.373090]  ? kthread_park+0xb0/0xb0
      [ 9520.373329]  ret_from_fork+0x3a/0x50
      [ 9520.373567]
      [ 9520.373738] Allocated by task 1804:
      [ 9520.373974]  kasan_kmalloc+0xff/0x180
      [ 9520.374208]  kasan_slab_alloc+0x11/0x20
      [ 9520.374447]  kmem_cache_alloc+0xfc/0x2d0
      [ 9520.374731]  __btrfs_add_free_space+0x40/0x580 [btrfs]
      [ 9520.375044]  unpin_extent_range+0x4f7/0x7a0 [btrfs]
      [ 9520.375383]  btrfs_finish_extent_commit+0x15f/0x4d0 [btrfs]
      [ 9520.375707]  btrfs_commit_transaction+0xb06/0x10e0 [btrfs]
      [ 9520.376027]  btrfs_alloc_data_chunk_ondemand+0x237/0x5c0 [btrfs]
      [ 9520.376365]  btrfs_check_data_free_space+0x81/0xd0 [btrfs]
      [ 9520.376689]  btrfs_delalloc_reserve_space+0x25/0x80 [btrfs]
      [ 9520.377018]  btrfs_direct_IO+0x42e/0x6d0 [btrfs]
      [ 9520.377284]  generic_file_direct_write+0x11e/0x220
      [ 9520.377587]  btrfs_file_write_iter+0x472/0xac0 [btrfs]
      [ 9520.377875]  aio_write+0x25c/0x360
      [ 9520.378106]  io_submit_one+0xaa0/0xdc0
      [ 9520.378343]  __se_sys_io_submit+0xfa/0x2f0
      [ 9520.378589]  __x64_sys_io_submit+0x43/0x50
      [ 9520.378840]  do_syscall_64+0x7d/0x240
      [ 9520.379081]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 9520.379387]
      [ 9520.379557] Freed by task 1802:
      [ 9520.379782]  __kasan_slab_free+0x173/0x260
      [ 9520.380028]  kasan_slab_free+0xe/0x10
      [ 9520.380262]  kmem_cache_free+0xc1/0x2c0
      [ 9520.380544]  btrfs_find_space_for_alloc+0x4cd/0x4e0 [btrfs]
      [ 9520.380866]  find_free_extent+0xa99/0x17e0 [btrfs]
      [ 9520.381166]  btrfs_reserve_extent+0xd5/0x1f0 [btrfs]
      [ 9520.381474]  btrfs_get_blocks_direct+0x60b/0xbd0 [btrfs]
      [ 9520.381761]  __blockdev_direct_IO+0x10ee/0x58a1
      [ 9520.382059]  btrfs_direct_IO+0x25a/0x6d0 [btrfs]
      [ 9520.382321]  generic_file_direct_write+0x11e/0x220
      [ 9520.382623]  btrfs_file_write_iter+0x472/0xac0 [btrfs]
      [ 9520.382904]  aio_write+0x25c/0x360
      [ 9520.383172]  io_submit_one+0xaa0/0xdc0
      [ 9520.383416]  __se_sys_io_submit+0xfa/0x2f0
      [ 9520.383678]  __x64_sys_io_submit+0x43/0x50
      [ 9520.383927]  do_syscall_64+0x7d/0x240
      [ 9520.384165]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 9520.384439]
      [ 9520.384610] The buggy address belongs to the object at ffff8800b7ada500
                      which belongs to the cache btrfs_free_space of size 72
      [ 9520.385175] The buggy address is located 0 bytes inside of
                      72-byte region [ffff8800b7ada500, ffff8800b7ada548)
      [ 9520.385691] The buggy address belongs to the page:
      [ 9520.385957] page:ffffea0002deb680 count:1 mapcount:0 mapping:ffff880108a1d700 index:0x0 compound_mapcount: 0
      [ 9520.388030] flags: 0x8100(slab|head)
      [ 9520.388281] raw: 0000000000008100 ffffea0002deb608 ffffea0002728808 ffff880108a1d700
      [ 9520.388722] raw: 0000000000000000 0000000000130013 00000001ffffffff 0000000000000000
      [ 9520.389169] page dumped because: kasan: bad access detected
      [ 9520.389473]
      [ 9520.389658] Memory state around the buggy address:
      [ 9520.389943]  ffff8800b7ada400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 9520.390368]  ffff8800b7ada480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 9520.390796] >ffff8800b7ada500: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc
      [ 9520.391223]                    ^
      [ 9520.391461]  ffff8800b7ada580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 9520.391885]  ffff8800b7ada600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 9520.392313] ==================================================================
      [ 9520.392772] BTRFS critical (device vdc): entry offset 2258497536, bytes 131072, bitmap no
      [ 9520.393247] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
      [ 9520.393705] PGD 800000010dbab067 P4D 800000010dbab067 PUD 107551067 PMD 0
      [ 9520.394059] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 9520.394378] CPU: 4 PID: 1721 Comm: btrfs-transacti Tainted: G    B        L    4.19.0-rc8-nbor #555
      [ 9520.394858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [ 9520.395350] RIP: 0010:rb_next+0x3c/0x90
      [ 9520.396461] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
      [ 9520.396762] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
      [ 9520.397115] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
      [ 9520.397468] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
      [ 9520.397821] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
      [ 9520.398188] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
      [ 9520.398555] FS:  0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
      [ 9520.399007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9520.399335] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
      [ 9520.399679] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 9520.400023] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 9520.400400] Call Trace:
      [ 9520.400648]  btrfs_dump_free_space+0x146/0x160 [btrfs]
      [ 9520.400974]  dump_space_info+0x2cd/0x310 [btrfs]
      [ 9520.401287]  btrfs_reserve_extent+0x1ee/0x1f0 [btrfs]
      [ 9520.401609]  __btrfs_prealloc_file_range+0x1cc/0x620 [btrfs]
      [ 9520.401952]  ? btrfs_update_time+0x180/0x180 [btrfs]
      [ 9520.402232]  ? _raw_spin_unlock+0x27/0x40
      [ 9520.402522]  ? btrfs_alloc_data_chunk_ondemand+0x2c0/0x5c0 [btrfs]
      [ 9520.402882]  btrfs_prealloc_file_range_trans+0x23/0x30 [btrfs]
      [ 9520.403261]  cache_save_setup+0x42e/0x580 [btrfs]
      [ 9520.403570]  ? btrfs_check_data_free_space+0xd0/0xd0 [btrfs]
      [ 9520.403871]  ? lock_downgrade+0x2f0/0x2f0
      [ 9520.404161]  ? btrfs_write_dirty_block_groups+0x11f/0x6e0 [btrfs]
      [ 9520.404481]  ? kasan_check_read+0x11/0x20
      [ 9520.404732]  ? do_raw_spin_unlock+0xa8/0x140
      [ 9520.405026]  btrfs_write_dirty_block_groups+0x2af/0x6e0 [btrfs]
      [ 9520.405375]  ? btrfs_start_dirty_block_groups+0x870/0x870 [btrfs]
      [ 9520.405694]  ? do_raw_spin_unlock+0xa8/0x140
      [ 9520.405958]  ? _raw_spin_unlock+0x27/0x40
      [ 9520.406243]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
      [ 9520.406574]  commit_cowonly_roots+0x4b9/0x610 [btrfs]
      [ 9520.406899]  ? commit_fs_roots+0x350/0x350 [btrfs]
      [ 9520.407253]  ? btrfs_run_delayed_refs+0x1b8/0x230 [btrfs]
      [ 9520.407589]  btrfs_commit_transaction+0x5e5/0x10e0 [btrfs]
      [ 9520.407925]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
      [ 9520.408262]  ? start_transaction+0x168/0x6c0 [btrfs]
      [ 9520.408582]  transaction_kthread+0x21c/0x240 [btrfs]
      [ 9520.408870]  kthread+0x1d2/0x1f0
      [ 9520.409138]  ? btrfs_cleanup_transaction+0xb50/0xb50 [btrfs]
      [ 9520.409440]  ? kthread_park+0xb0/0xb0
      [ 9520.409682]  ret_from_fork+0x3a/0x50
      [ 9520.410508] Dumping ftrace buffer:
      [ 9520.410764]    (ftrace buffer empty)
      [ 9520.411007] CR2: 0000000000000011
      [ 9520.411297] ---[ end trace 01a0863445cf360a ]---
      [ 9520.411568] RIP: 0010:rb_next+0x3c/0x90
      [ 9520.412644] RSP: 0018:ffff8801074ff780 EFLAGS: 00010292
      [ 9520.412932] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81b5ac4c
      [ 9520.413274] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000011
      [ 9520.413616] RBP: ffff8801074ff7a0 R08: ffffed0021d64ccc R09: ffffed0021d64ccc
      [ 9520.414007] R10: 0000000000000001 R11: ffffed0021d64ccb R12: ffff8800b91e0000
      [ 9520.414349] R13: ffff8800a3ceba48 R14: ffff8800b627bf80 R15: 0000000000020000
      [ 9520.416074] FS:  0000000000000000(0000) GS:ffff88010eb00000(0000) knlGS:0000000000000000
      [ 9520.416536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9520.416848] CR2: 0000000000000011 CR3: 0000000106b52000 CR4: 00000000000006a0
      [ 9520.418477] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 9520.418846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 9520.419204] Kernel panic - not syncing: Fatal exception
      [ 9520.419666] Dumping ftrace buffer:
      [ 9520.419930]    (ftrace buffer empty)
      [ 9520.420168] Kernel Offset: disabled
      [ 9520.420406] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Fix this by acquiring the respective lock before iterating the rbtree.
      Reported-by: default avatarNikolay Borisov <nborisov@suse.com>
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c2d1ef3
    • Filipe Manana's avatar
      Btrfs: fix use-after-free during inode eviction · bf005ed0
      Filipe Manana authored
      commit 421f0922 upstream.
      
      At inode.c:evict_inode_truncate_pages(), when we iterate over the
      inode's extent states, we access an extent state record's "state" field
      after we unlocked the inode's io tree lock. This can lead to a
      use-after-free issue because after we unlock the io tree that extent
      state record might have been freed due to being merged into another
      adjacent extent state record (a previous inflight bio for a read
      operation finished in the meanwhile which unlocked a range in the io
      tree and cause a merge of extent state records, as explained in the
      comment before the while loop added in commit 6ca07097 ("Btrfs: fix
      hang during inode eviction due to concurrent readahead")).
      
      Fix this by keeping a copy of the extent state's flags in a local
      variable and using it after unlocking the io tree.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201189
      Fixes: b9d0b389 ("btrfs: Add handler for invalidate page")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf005ed0
    • Josef Bacik's avatar
      btrfs: move the dio_sem higher up the callchain · c8bac97d
      Josef Bacik authored
      commit c495144b upstream.
      
      We're getting a lockdep splat because we take the dio_sem under the
      log_mutex.  What we really need is to protect fsync() from logging an
      extent map for an extent we never waited on higher up, so just guard the
      whole thing with dio_sem.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted
      ------------------------------------------------------
      aio-dio-invalid/30928 is trying to acquire lock:
      0000000092621cfd (&mm->mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0
      
      but task is already holding lock:
      00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #5 (&ei->dio_sem){++++}:
             lock_acquire+0xbd/0x220
             down_write+0x51/0xb0
             btrfs_log_changed_extents+0x80/0xa40
             btrfs_log_inode+0xbaf/0x1000
             btrfs_log_inode_parent+0x26f/0xa80
             btrfs_log_dentry_safe+0x50/0x70
             btrfs_sync_file+0x357/0x540
             do_fsync+0x38/0x60
             __ia32_sys_fdatasync+0x12/0x20
             do_fast_syscall_32+0x9a/0x2f0
             entry_SYSENTER_compat+0x84/0x96
      
      -> #4 (&ei->log_mutex){+.+.}:
             lock_acquire+0xbd/0x220
             __mutex_lock+0x86/0xa10
             btrfs_record_unlink_dir+0x2a/0xa0
             btrfs_unlink+0x5a/0xc0
             vfs_unlink+0xb1/0x1a0
             do_unlinkat+0x264/0x2b0
             do_fast_syscall_32+0x9a/0x2f0
             entry_SYSENTER_compat+0x84/0x96
      
      -> #3 (sb_internal#2){.+.+}:
             lock_acquire+0xbd/0x220
             __sb_start_write+0x14d/0x230
             start_transaction+0x3e6/0x590
             btrfs_evict_inode+0x475/0x640
             evict+0xbf/0x1b0
             btrfs_run_delayed_iputs+0x6c/0x90
             cleaner_kthread+0x124/0x1a0
             kthread+0x106/0x140
             ret_from_fork+0x3a/0x50
      
      -> #2 (&fs_info->cleaner_delayed_iput_mutex){+.+.}:
             lock_acquire+0xbd/0x220
             __mutex_lock+0x86/0xa10
             btrfs_alloc_data_chunk_ondemand+0x197/0x530
             btrfs_check_data_free_space+0x4c/0x90
             btrfs_delalloc_reserve_space+0x20/0x60
             btrfs_page_mkwrite+0x87/0x520
             do_page_mkwrite+0x31/0xa0
             __handle_mm_fault+0x799/0xb00
             handle_mm_fault+0x7c/0xe0
             __do_page_fault+0x1d3/0x4a0
             async_page_fault+0x1e/0x30
      
      -> #1 (sb_pagefaults){.+.+}:
             lock_acquire+0xbd/0x220
             __sb_start_write+0x14d/0x230
             btrfs_page_mkwrite+0x6a/0x520
             do_page_mkwrite+0x31/0xa0
             __handle_mm_fault+0x799/0xb00
             handle_mm_fault+0x7c/0xe0
             __do_page_fault+0x1d3/0x4a0
             async_page_fault+0x1e/0x30
      
      -> #0 (&mm->mmap_sem){++++}:
             __lock_acquire+0x42e/0x7a0
             lock_acquire+0xbd/0x220
             down_read+0x48/0xb0
             get_user_pages_unlocked+0x5a/0x1e0
             get_user_pages_fast+0xa4/0x150
             iov_iter_get_pages+0xc3/0x340
             do_direct_IO+0xf93/0x1d70
             __blockdev_direct_IO+0x32d/0x1c20
             btrfs_direct_IO+0x227/0x400
             generic_file_direct_write+0xcf/0x180
             btrfs_file_write_iter+0x308/0x58c
             aio_write+0xf8/0x1d0
             io_submit_one+0x3a9/0x620
             __ia32_compat_sys_io_submit+0xb2/0x270
             do_int80_syscall_32+0x5b/0x1a0
             entry_INT80_compat+0x88/0xa0
      
      other info that might help us debug this:
      
      Chain exists of:
        &mm->mmap_sem --> &ei->log_mutex --> &ei->dio_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&ei->dio_sem);
                                     lock(&ei->log_mutex);
                                     lock(&ei->dio_sem);
        lock(&mm->mmap_sem);
      
       *** DEADLOCK ***
      
      1 lock held by aio-dio-invalid/30928:
       #0: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400
      
      stack backtrace:
      CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      Call Trace:
       dump_stack+0x7c/0xbb
       print_circular_bug.isra.37+0x297/0x2a4
       check_prev_add.constprop.45+0x781/0x7a0
       ? __lock_acquire+0x42e/0x7a0
       validate_chain.isra.41+0x7f0/0xb00
       __lock_acquire+0x42e/0x7a0
       lock_acquire+0xbd/0x220
       ? get_user_pages_unlocked+0x5a/0x1e0
       down_read+0x48/0xb0
       ? get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_fast+0xa4/0x150
       iov_iter_get_pages+0xc3/0x340
       do_direct_IO+0xf93/0x1d70
       ? __alloc_workqueue_key+0x358/0x490
       ? __blockdev_direct_IO+0x14b/0x1c20
       __blockdev_direct_IO+0x32d/0x1c20
       ? btrfs_run_delalloc_work+0x40/0x40
       ? can_nocow_extent+0x490/0x490
       ? kvm_clock_read+0x1f/0x30
       ? can_nocow_extent+0x490/0x490
       ? btrfs_run_delalloc_work+0x40/0x40
       btrfs_direct_IO+0x227/0x400
       ? btrfs_run_delalloc_work+0x40/0x40
       generic_file_direct_write+0xcf/0x180
       btrfs_file_write_iter+0x308/0x58c
       aio_write+0xf8/0x1d0
       ? kvm_clock_read+0x1f/0x30
       ? __might_fault+0x3e/0x90
       io_submit_one+0x3a9/0x620
       ? io_submit_one+0xe5/0x620
       __ia32_compat_sys_io_submit+0xb2/0x270
       do_int80_syscall_32+0x5b/0x1a0
       entry_INT80_compat+0x88/0xa0
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8bac97d
    • Josef Bacik's avatar
      btrfs: don't run delayed_iputs in commit · 87d7ea68
      Josef Bacik authored
      commit 30928e9b upstream.
      
      This could result in a really bad case where we do something like
      
      evict
        evict_refill_and_join
          btrfs_commit_transaction
            btrfs_run_delayed_iputs
              evict
                evict_refill_and_join
                  btrfs_commit_transaction
      ... forever
      
      We have plenty of other places where we run delayed iputs that are much
      safer, let those do the work.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87d7ea68
    • Josef Bacik's avatar
      btrfs: only free reserved extent if we didn't insert it · 42551ab6
      Josef Bacik authored
      commit 49940bdd upstream.
      
      When we insert the file extent once the ordered extent completes we free
      the reserved extent reservation as it'll have been migrated to the
      bytes_used counter.  However if we error out after this step we'll still
      clear the reserved extent reservation, resulting in a negative
      accounting of the reserved bytes for the block group and space info.
      Fix this by only doing the free if we didn't successfully insert a file
      extent for this extent.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42551ab6
    • Josef Bacik's avatar
      btrfs: don't use ctl->free_space for max_extent_size · 7509d4f9
      Josef Bacik authored
      commit fb5c39d7 upstream.
      
      max_extent_size is supposed to be the largest contiguous range for the
      space info, and ctl->free_space is the total free space in the block
      group.  We need to keep track of these separately and _only_ use the
      max_free_space if we don't have a max_extent_size, as that means our
      original request was too large to search any of the block groups for and
      therefore wouldn't have a max_extent_size set.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7509d4f9
    • Josef Bacik's avatar
      btrfs: set max_extent_size properly · 65d41e98
      Josef Bacik authored
      commit ad22cf6e upstream.
      
      We can't use entry->bytes if our entry is a bitmap entry, we need to use
      entry->max_extent_size in that case.  Fix up all the logic to make this
      consistent.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65d41e98
    • Filipe Manana's avatar
      Btrfs: fix assertion on fsync of regular file when using no-holes feature · 947a9b02
      Filipe Manana authored
      commit 7ed586d0 upstream.
      
      When using the NO_HOLES feature and logging a regular file, we were
      expecting that if we find an inline extent, that either its size in RAM
      (uncompressed and unenconded) matches the size of the file or if it does
      not, that it matches the sector size and it represents compressed data.
      This assertion does not cover a case where the length of the inline extent
      is smaller than the sector size and also smaller the file's size, such
      case is possible through fallocate. Example:
      
        $ mkfs.btrfs -f -O no-holes /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ xfs_io -f -c "pwrite -S 0xb60 0 21" /mnt/foobar
        $ xfs_io -c "falloc 40 40" /mnt/foobar
        $ xfs_io -c "fsync" /mnt/foobar
      
      In the above example we trigger the assertion because the inline extent's
      length is 21 bytes while the file size is 80 bytes. The fallocate() call
      merely updated the file's size and did not touch the existing inline
      extent, as expected.
      
      So fix this by adjusting the assertion so that an inline extent length
      smaller than the file size is valid if the file size is smaller than the
      filesystem's sector size.
      
      A test case for fstests follows soon.
      Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Fixes: a89ca6f2 ("Btrfs: fix fsync after truncate when no_holes feature is enabled")
      CC: stable@vger.kernel.org # 4.14+
      Link: https://lore.kernel.org/linux-btrfs/CAE5jQCfRSBC7n4pUTFJcmHh109=gwyT9mFkCOL+NKfzswmR=_Q@mail.gmail.com/Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      947a9b02
    • Filipe Manana's avatar
      Btrfs: fix null pointer dereference on compressed write path error · bdda3175
      Filipe Manana authored
      commit 3527a018 upstream.
      
      At inode.c:compress_file_range(), under the "free_pages_out" label, we can
      end up dereferencing the "pages" pointer when it has a NULL value. This
      case happens when "start" has a value of 0 and we fail to allocate memory
      for the "pages" pointer. When that happens we jump to the "cont" label and
      then enter the "if (start == 0)" branch where we immediately call the
      cow_file_range_inline() function. If that function returns 0 (success
      creating an inline extent) or an error (like -ENOMEM for example) we jump
      to the "free_pages_out" label and then access "pages[i]" leading to a NULL
      pointer dereference, since "nr_pages" has a value greater than zero at
      that point.
      
      Fix this by setting "nr_pages" to 0 when we fail to allocate memory for
      the "pages" pointer.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201119
      Fixes: 771ed689 ("Btrfs: Optimize compressed writeback and reads")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdda3175
    • Qu Wenruo's avatar
      btrfs: qgroup: Dirty all qgroups before rescan · 08374d8b
      Qu Wenruo authored
      commit 9c7b0c2e upstream.
      
      [BUG]
      In the following case, rescan won't zero out the number of qgroup 1/0:
      
        $ mkfs.btrfs -fq $DEV
        $ mount $DEV /mnt
      
        $ btrfs quota enable /mnt
        $ btrfs qgroup create 1/0 /mnt
        $ btrfs sub create /mnt/sub
        $ btrfs qgroup assign 0/257 1/0 /mnt
      
        $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
        $ btrfs sub snap /mnt/sub /mnt/snap
        $ btrfs quota rescan -w /mnt
        $ btrfs qgroup show -pcre /mnt
        qgroupid         rfer         excl     max_rfer     max_excl parent  child
        --------         ----         ----     --------     -------- ------  -----
        0/5          16.00KiB     16.00KiB         none         none ---     ---
        0/257      1016.00KiB     16.00KiB         none         none 1/0     ---
        0/258      1016.00KiB     16.00KiB         none         none ---     ---
        1/0        1016.00KiB     16.00KiB         none         none ---     0/257
      
      So far so good, but:
      
        $ btrfs qgroup remove 0/257 1/0 /mnt
        WARNING: quotas may be inconsistent, rescan needed
        $ btrfs quota rescan -w /mnt
        $ btrfs qgroup show -pcre  /mnt
        qgoupid         rfer         excl     max_rfer     max_excl parent  child
        --------         ----         ----     --------     -------- ------  -----
        0/5          16.00KiB     16.00KiB         none         none ---     ---
        0/257      1016.00KiB     16.00KiB         none         none ---     ---
        0/258      1016.00KiB     16.00KiB         none         none ---     ---
        1/0        1016.00KiB     16.00KiB         none         none ---     ---
      	     ^^^^^^^^^^     ^^^^^^^^ not cleared
      
      [CAUSE]
      Before rescan we call qgroup_rescan_zero_tracking() to zero out all
      qgroups' accounting numbers.
      
      However we don't mark all qgroups dirty, but rely on rescan to do so.
      
      If we have any high level qgroup without children, it won't be marked
      dirty during rescan, since we cannot reach that qgroup.
      
      This will cause QGROUP_INFO items of childless qgroups never get updated
      in the quota tree, thus their numbers will stay the same in "btrfs
      qgroup show" output.
      
      [FIX]
      Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even if
      we have childless qgroups, their QGROUP_INFO items will still get
      updated during rescan.
      Reported-by: default avatarMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Tested-by: default avatarMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08374d8b
    • Filipe Manana's avatar
      Btrfs: fix wrong dentries after fsync of file that got its parent replaced · 45be291e
      Filipe Manana authored
      commit 0f375eed upstream.
      
      In a scenario like the following:
      
        mkdir /mnt/A               # inode 258
        mkdir /mnt/B               # inode 259
        touch /mnt/B/bar           # inode 260
      
        sync
      
        mv /mnt/B/bar /mnt/A/bar
        mv -T /mnt/A /mnt/B
        fsync /mnt/B/bar
      
        <power fail>
      
      After replaying the log we end up with file bar having 2 hard links, both
      with the name 'bar' and one in the directory with inode number 258 and the
      other in the directory with inode number 259. Also, we end up with the
      directory inode 259 still existing and with the directory inode 258 still
      named as 'A', instead of 'B'. In this scenario, file 'bar' should only
      have one hard link, located at directory inode 258, the directory inode
      259 should not exist anymore and the name for directory inode 258 should
      be 'B'.
      
      This incorrect behaviour happens because when attempting to log the old
      parents of an inode, we skip any parents that no longer exist. Fix this
      by forcing a full commit if an old parent no longer exists.
      
      A test case for fstests follows soon.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45be291e