1. 03 Aug, 2015 40 commits
    • Daniel Borkmann's avatar
      compiler-intel: fix wrong compiler barrier() macro · 9116f601
      Daniel Borkmann authored
      commit b86a50c3 upstream.
      
      Cleanup commit 73679e50 ("compiler-intel.h: Remove duplicate
      definition") removed the double definition of __memory_barrier()
      intrinsics.
      
      However, in doing so, it also removed the preceding #undef barrier by
      accident, meaning, the actual barrier() macro from compiler-gcc.h with
      inline asm is still in place as __GNUC__ is provided.
      
      Subsequently, barrier() can never be defined as __memory_barrier() from
      compiler.h since it already has a definition in place and if we trust
      the comment in compiler-intel.h, ecc doesn't support gcc specific asm
      statements.
      
      I don't have an ecc at hand (unsure if that's still used in the field?)
      and only found this by accident during code review, a revert of that
      cleanup would be simplest option.
      
      Fixes: 73679e50 ("compiler-intel.h: Remove duplicate definition")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarPranith Kumar <bobby.prani@gmail.com>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: mancha security <mancha1@zoho.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9116f601
    • Jean Delvare's avatar
      firmware: dmi_scan: Only honor end-of-table for 64-bit tables · 534cc628
      Jean Delvare authored
      commit 17cd5bd5 upstream.
      
      A 32-bit entry point to a DMI table says how many structures the table
      contains. The SMBIOS specification explicitly says that end-of-table
      markers should be ignored if they are not actually at the end of the
      DMI table. So only honor the end-of-table marker for tables accessed
      through 64-bit entry points, as they do not specify a structure count.
      
      Fixes: fc430262 ("dmi: add support for SMBIOS 3.0 64-bit entry point")
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      534cc628
    • Takashi Iwai's avatar
      PM / sleep: Increase default DPM watchdog timeout to 60 · 01fed233
      Takashi Iwai authored
      commit fff3b16d upstream.
      
      Many harddisks (mostly WD ones) have firmware problems and take too
      long, more than 10 seconds, to resume from suspend.  And this often
      exceeds the default DPM watchdog timeout (12 seconds), resulting in a
      kernel panic out of sudden.
      
      Since most distros just take the default as is, we should give a bit
      more safer value.  This patch increases the default value from 12
      seconds to one minute, which has been confirmed to be long enough for
      such problematic disks.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=91921
      Fixes: 70fea60d (PM / Sleep: Detect device suspend/resume lockup and log event)
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01fed233
    • Naoya Horiguchi's avatar
      mm/hugetlb: introduce minimum hugepage order · c791ad1e
      Naoya Horiguchi authored
      commit 641844f5 upstream.
      
      Currently the initial value of order in dissolve_free_huge_page is 64 or
      32, which leads to the following warning in static checker:
      
        mm/hugetlb.c:1203 dissolve_free_huge_pages()
        warn: potential right shift more than type allows '9,18,64'
      
      This is a potential risk of infinite loop, because 1 << order (== 0) is used
      in for-loop like this:
      
        for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D 1 << order)
            ...
      
      So this patch fixes it by using global minimum_order calculated at boot time.
      
          text    data     bss     dec     hex filename
         28313     469   84236  113018   1b97a mm/hugetlb.o
         28256     473   84236  112965   1b945 mm/hugetlb.o (patched)
      
      Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c791ad1e
    • Arnd Bergmann's avatar
      tty: remove platform_sysrq_reset_seq · 0bcd7774
      Arnd Bergmann authored
      commit ffb6e0c9 upstream.
      
      The platform_sysrq_reset_seq code was intended as a way for an embedded
      platform to provide its own sysrq sequence at compile time. After over two
      years, nobody has started using it in an upstream kernel, and the platforms
      that were interested in it have moved on to devicetree, which can be used
      to configure the sequence without requiring kernel changes. The method is
      also incompatible with the way that most architectures build support for
      multiple platforms into a single kernel.
      
      Now the code is producing warnings when built with gcc-5.1:
      
      drivers/tty/sysrq.c: In function 'sysrq_init':
      drivers/tty/sysrq.c:959:33: warning: array subscript is above array bounds [-Warray-bounds]
         key = platform_sysrq_reset_seq[i];
      
      We could fix this, but it seems unlikely that it will ever be used, so
      let's just remove the code instead. We still have the option to pass the
      sequence either in DT, using the kernel command line, or using the
      /sys/module/sysrq/parameters/reset_seq file.
      
      Fixes: 154b7a48 ("Input: sysrq - allow specifying alternate reset sequence")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bcd7774
    • Colin Ian King's avatar
      RDMA/ocrdma: fix double free on pd · f354666d
      Colin Ian King authored
      commit 4dc54442 upstream.
      
      A reorganisation of the PD allocation and deallocation in commit
      9ba1377d ("RDMA/ocrdma: Move PD resource management to driver.")
      introduced a double free on pd, as detected by static analysis by
      smatch:
      
      drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:682 ocrdma_alloc_pd()
        error: double free of 'pd'^
      
      The original call to ocrdma_mbx_dealloc_pd() (which does not kfree
      pd) was replaced with a call to _ocrdma_dealloc_pd() (which does
      kfree pd).  The kfree following this call causes the double free,
      so just remove it to fix the problem.
      
      Fixes: 9ba1377d ("RDMA/ocrdma: Move PD resource management to driver.")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-By: default avatarDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f354666d
    • Geert Uytterhoeven's avatar
      PM / clk: Fix clock error check in __pm_clk_add() · 32419b85
      Geert Uytterhoeven authored
      commit 3fc3a0be upstream.
      
      In the final iteration of commit 245bd6f6 ("PM / clock_ops: Add
      pm_clk_add_clk()"), a refcount increment was added by Grygorii Strashko.
      However, the accompanying IS_ERR() check operates on the wrong clock
      pointer, which is always zero at this point, i.e. not an error.
      This may lead to a NULL pointer dereference later, when __clk_get()
      tries to dereference an error pointer.
      
      Check the passed clock pointer instead to fix this.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Fixes: 245bd6f6 ("PM / clock_ops: Add pm_clk_add_clk()")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32419b85
    • Ulf Hansson's avatar
      mmc: sdhci: Restore behavior while creating OCR mask · 55df3292
      Ulf Hansson authored
      commit 5fd26c7e upstream.
      
      Commit 3a48edc4 ("mmc: sdhci: Use mmc core regulator infrastucture")
      changed the behavior for how to assign the ocr_avail mask for the mmc
      host. More precisely it started to mask the bits instead of assigning
      them.
      
      Restore the behavior, but also make it clear that an OCR mask created
      from an external regulator overrides the other ones. The OCR mask is
      determined by one of the following with this priority:
      
      1. Supported ranges of external regulator if one supplies VDD
      2. Host OCR mask if set by the driver (based on DT properties)
      3. The capabilities reported by the controller itself
      
      Fixes: 3a48edc4 ("mmc: sdhci: Use mmc core regulator infrastucture")
      Cc: Tim Kryger <tim.kryger@gmail.com>
      Reported-by: default avatarYangbo Lu <yangbo.lu@freescale.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: default avatarTim Kryger <tim.kryger@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55df3292
    • Ding Wang's avatar
      mmc: card: Fixup request missing in mmc_blk_issue_rw_rq · f213f0f7
      Ding Wang authored
      commit 29535f7b upstream.
      
      The current handler of MMC_BLK_CMD_ERR in mmc_blk_issue_rw_rq function
      may cause new coming request permanent missing when the ongoing
      request (previoulsy started) complete end.
      
      The problem scenario is as follows:
      (1) Request A is ongoing;
      (2) Request B arrived, and finally mmc_blk_issue_rw_rq() is called;
      (3) Request A encounters the MMC_BLK_CMD_ERR error;
      (4) In the error handling of MMC_BLK_CMD_ERR, suppose mmc_blk_cmd_err()
          end request A completed and return zero. Continue the error handling,
          suppose mmc_blk_reset() reset device success;
      (5) Continue the execution, while loop completed because variable ret
          is zero now;
      (6) Finally, mmc_blk_issue_rw_rq() return without processing request B.
      
      The process related to the missing request may wait that IO request
      complete forever, possibly crashing the application or hanging the system.
      
      Fix this issue by starting new request when reset success.
      Signed-off-by: default avatarDing Wang <justin.wang@spreadtrum.com>
      Fixes: 67716327 ("mmc: block: add eMMC hardware reset support")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f213f0f7
    • Arnd Bergmann's avatar
      serial: samsung: only use earlycon for console · 06ab12e6
      Arnd Bergmann authored
      commit 357d5615 upstream.
      
      A configuration that enables earlycon but not the core console
      code causes a link error:
      
        drivers/built-in.o: In function `setup_earlycon':
        drivers/tty/serial/earlycon.c:70: undefined reference to `uart_parse_earlycon'
      
      That error can be triggered by the newly added samsung earlycon support,
      which is missing a 'select' statement.
      
      As suggested by Peter Hurley, solves the problem by moving the
      'select SERIAL_EARLYCON' statement to the samsung console driver
      option, as it is done by all other console drivers.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: b94ba032 ("serial: samsung: Add support for early console")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06ab12e6
    • Jiang Liu's avatar
      ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel · 1d7a398b
      Jiang Liu authored
      commit 1fb01ca9 upstream.
      
      Zoltan Boszormenyi reported this regression:
        "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
         1565:230e) network chip on the mainboard. After the r8169 driver loaded
         the IRQs in the machine went berserk. Keyboard keypressed arrived with
         considerable latency and duplicated, so no real work was possible.
         The machine responded to the power button but didn't actually power
         down. It just stuck at the powering down message. I had to press the
         power button for 4 seconds to power it down.
      
         The computer is a POS machine with a big battery inside. Because of this,
         either ACPI or the Realtek chip kept the bad state and after rebooting,
         the network chip didn't even show up in lspci. Not even the PXE ROM
         announced itself during boot. I had to disconnect the battery to beat
         some sense back to the computer.
      
         The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
         good."
      
      The regression is caused by commit 593669c2 (x86/PCI/ACPI: Use common
      ACPI resource interfaces to simplify implementation). Since commit
      593669c2, x86 PCI ACPI host bridge driver validates ACPI resources by
      first converting an ACPI resource to a 'struct resource' structure and
      then applying checks against the converted resource structure. The 'start'
      and 'end' fields in 'struct resource' are defined to be type of
      resource_size_t, which may be 32 bits or 64 bits depending on
      CONFIG_PHYS_ADDR_T_64BIT.
      
      This may cause incorrect resource validation results with 32-bit kernels
      because 64-bit ACPI resource descriptors may get truncated when converting
      to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
      affects PCI resource allocation subsystem and makes some PCI devices and
      the system behave abnormally due to incorrect resource assignment.
      
      So enhance the ACPI resource parsing interfaces to ignore ACPI resource
      descriptors with address/offset above 4G when running in 32-bit mode.
      
      With the fix applied, the behavior of the machine was restored to how
      3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
      and lspci -vvxxx shows that everything is at the same memory window as
      they were with 3.18.16.
      Reported-and-tested-by: default avatarBoszormenyi Zoltan <zboszor@pr.hu>
      Fixes: 593669c2 (x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation)
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d7a398b
    • Lv Zheng's avatar
      ACPICA: Tables: Enable default 64-bit FADT addresses favor · 24b2b68e
      Lv Zheng authored
      commit 0ea61381 upstream.
      
      ACPICA commit 4da56eeae0749dfe8491285c1e1fad48f6efafd8
      
      The following commit temporarily disables correct 64-bit FADT addresses
      favor during the period the root cause of the bug is not fixed:
       Commit: 85dbd580
       ACPICA: Tables: Restore old behavor to favor 32-bit FADT addresses.
      
      With enough protections, this patch re-enables 64-bit FADT addresses by
      default. If regressions are reported against such change, this patch should
      be bisected and reverted.
      Note that 64-bit FACS favor and 64-bit firmware waking vector favor are
      excluded by this commit in order not to break OSPMs. Lv Zheng.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021
      Link: https://github.com/acpica/acpica/commit/4da56eeaReported-and-tested-by: default avatarOswald Buddenhagen <ossi@kde.org>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24b2b68e
    • Lv Zheng's avatar
      ACPICA: Tables: Fix an issue that FACS initialization is performed twice · c0f23125
      Lv Zheng authored
      commit c04be184 upstream.
      
      ACPICA commit 90f5332a15e9d9ba83831ca700b2b9f708274658
      
      This patch adds a new FACS initialization flag for acpi_tb_initialize().
      acpi_enable_subsystem() might be invoked several times in OS bootup process,
      and we don't want FACS initialization to be invoked twice. Lv Zheng.
      
      Link: https://github.com/acpica/acpica/commit/90f5332aSigned-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0f23125
    • Lv Zheng's avatar
      ACPICA: Tables: Enable both 32-bit and 64-bit FACS · b1bce17e
      Lv Zheng authored
      commit c04e1fb4 upstream.
      
      ACPICA commit f7b86f35416e3d1f71c3d816ff5075ddd33ed486
      
      The following commit is reported to have broken s2ram on some platforms:
       Commit: 0249ed24
       ACPICA: Add option to favor 32-bit FADT addresses.
      The platform reports 2 FACS tables (which is not allowed by ACPI
      specification) and the new 32-bit address favor rule forces OSPMs to use
      the FACS table reported via FADT's X_FIRMWARE_CTRL field.
      
      The root cause of the reported bug might be one of the followings:
      1. BIOS may favor the 64-bit firmware waking vector address when the
         version of the FACS is greater than 0 and Linux currently only supports
         resuming from the real mode, so the 64-bit firmware waking vector has
         never been set and might be invalid to BIOS while the commit enables
         higher version FACS.
      2. BIOS may favor the FACS reported via the "FIRMWARE_CTRL" field in the
         FADT while the commit doesn't set the firmware waking vector address of
         the FACS reported by "FIRMWARE_CTRL", it only sets the firware waking
         vector address of the FACS reported by "X_FIRMWARE_CTRL".
      
      This patch excludes the cases that can trigger the bugs caused by the root
      cause 2.
      
      There is no handshaking mechanism can be used by OSPM to tell BIOS which
      FACS is currently used. Thus the FACS reported by "FIRMWARE_CTRL" may still
      be used by BIOS and the 0 value of the 32-bit firmware waking vector might
      trigger such failure.
      
      This patch tries to favor 32bit FACS address in another way where both the
      FACS reported by "FIRMWARE_CTRL" and the FACS reported by "X_FIRMWARE_CTRL"
      are loaded so that further commit can set firmware waking vector in the
      both tables to ensure we can exclude the cases that trigger the bugs caused
      by the root cause 2. The exclusion is split into 2 commits as this commit
      is also useful for dumping more ACPI tables, it won't get reverted when
      such exclusion is no longer necessary. Lv Zheng.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021
      Link: https://github.com/acpica/acpica/commit/f7b86f35Reported-and-tested-by: default avatarOswald Buddenhagen <ossi@kde.org>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1bce17e
    • Rafael J. Wysocki's avatar
      ACPI / LPSS: Fix up acpi_lpss_create_device() · af3cc772
      Rafael J. Wysocki authored
      commit d3e13ff3 upstream.
      
      Fix a return value (which should be a negative error code) and a
      memory leak (the list allocated by acpi_dev_get_resources() needs
      to be freed on ioremap() errors too) in acpi_lpss_create_device()
      introduced by commit 4483d59e 'ACPI / LPSS: check the result
      of ioremap()'.
      
      Fixes: 4483d59e 'ACPI / LPSS: check the result of ioremap()'
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3cc772
    • Rafael J. Wysocki's avatar
      ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage · 3dfbf877
      Rafael J. Wysocki authored
      commit 0294112e upstream.
      
      This effectively reverts the following three commits:
      
       7bc10388 ACPI / resources: free memory on error in add_region_before()
       0f1b414d ACPI / PNP: Avoid conflicting resource reservations
       b9a5e5e1 ACPI / init: Fix the ordering of acpi_reserve_resources()
      
      (commit b9a5e5e1 introduced regressions some of which, but not
      all, were addressed by commit 0f1b414d and commit 7bc10388
      was a fixup on top of the latter) and causes ACPI fixed hardware
      resources to be reserved at the fs_initcall_sync stage of system
      initialization.
      
      The story is as follows.  First, a boot regression was reported due
      to an apparent resource reservation ordering change after a commit
      that shouldn't lead to such changes.  Investigation led to the
      conclusion that the problem happened because acpi_reserve_resources()
      was executed at the device_initcall() stage of system initialization
      which wasn't strictly ordered with respect to driver initialization
      (and with respect to the initialization of the pcieport driver in
      particular), so a random change causing the device initcalls to be
      run in a different order might break things.
      
      The response to that was to attempt to run acpi_reserve_resources()
      as soon as we knew that ACPI would be in use (commit b9a5e5e1).
      However, that turned out to be too early, because it caused resource
      reservations made by the PNP system driver to fail on at least one
      system and that failure was addressed by commit 0f1b414d.
      
      That fix still turned out to be insufficient, though, because
      calling acpi_reserve_resources() before the fs_initcall stage of
      system initialization caused a boot regression to happen on the
      eCAFE EC-800-H20G/S netbook.  That meant that we only could call
      acpi_reserve_resources() at the fs_initcall initialization stage
      or later, but then we might just as well call it after the PNP
      initalization in which case commit 0f1b414d wouldn't be
      necessary any more.
      
      For this reason, the changes made by commit 0f1b414d are reverted
      (along with a memory leak fixup on top of that commit), the changes
      made by commit b9a5e5e1 that went too far are reverted too and
      acpi_reserve_resources() is changed into fs_initcall_sync, which
      will cause it to be executed after the PNP subsystem initialization
      (which is an fs_initcall) and before device initcalls (including
      the pcieport driver initialization) which should avoid the initial
      issue.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
      Link: http://marc.info/?t=143092384600002&r=1&w=2
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
      Link: http://marc.info/?t=143389402600001&r=1&w=2
      Fixes: b9a5e5e1 "ACPI / init: Fix the ordering of acpi_reserve_resources()"
      Reported-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dfbf877
    • Dan Carpenter's avatar
      ACPI / resources: free memory on error in add_region_before() · 2dfdaa26
      Dan Carpenter authored
      commit 7bc10388 upstream.
      
      There is a small memory leak on error.
      
      Fixes: 0f1b414d (ACPI / PNP: Avoid conflicting resource reservations)
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2dfdaa26
    • Ilya Dryomov's avatar
      crush: fix a bug in tree bucket decode · 94fc3084
      Ilya Dryomov authored
      commit 82cd003a upstream.
      
      struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
      should be used.  -Wconversion catches this, but I guess it went
      unnoticed in all the noise it spews.  The actual problem (at least for
      common crushmaps) isn't the u32 -> u8 truncation though - it's the
      advancement by 4 bytes instead of 1 in the crushmap buffer.
      
      Fixes: http://tracker.ceph.com/issues/2759Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94fc3084
    • Miklos Szeredi's avatar
      fuse: initialize fc->release before calling it · 650b07ba
      Miklos Szeredi authored
      commit 0ad0b325 upstream.
      
      fc->release is called from fuse_conn_put() which was used in the error
      cleanup before fc->release was initialized.
      
      [Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
      fuse_conn_init(fc) instead of before.]
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Fixes: a325f9b9 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      650b07ba
    • Stephen Smalley's avatar
      selinux: fix mprotect PROT_EXEC regression caused by mm change · 872d2790
      Stephen Smalley authored
      commit 892e8cac upstream.
      
      commit 66fc1303 ("mm: shmem_zero_setup
      skip security check and lockdep conflict with XFS") caused a regression
      for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
      shared anonymous mappings.  However, even before that regression, the
      checking on such mprotect PROT_EXEC calls was inconsistent with the
      checking on a mmap PROT_EXEC call for a shared anonymous mapping.  On a
      mmap, the security hook is passed a NULL file and knows it is dealing
      with an anonymous mapping and therefore applies an execmem check and no
      file checks.  On a mprotect, the security hook is passed a vma with a
      non-NULL vm_file (as this was set from the internally-created shmem
      file during mmap) and therefore applies the file-based execute check
      and no execmem check.  Since the aforementioned commit now marks the
      shmem zero inode with the S_PRIVATE flag, the file checks are disabled
      and we have no checking at all on mprotect PROT_EXEC.  Add a test to
      the mprotect hook logic for such private inodes, and apply an execmem
      check in that case.  This makes the mmap and mprotect checking
      consistent for shared anonymous mappings, as well as for /dev/zero and
      ashmem.
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      872d2790
    • Paul Moore's avatar
      selinux: don't waste ebitmap space when importing NetLabel categories · 9d680e03
      Paul Moore authored
      commit 33246035 upstream.
      
      At present we don't create efficient ebitmaps when importing NetLabel
      category bitmaps.  This can present a problem when comparing ebitmaps
      since ebitmap_cmp() is very strict about these things and considers
      these wasteful ebitmaps not equal when compared to their more
      efficient counterparts, even if their values are the same.  This isn't
      likely to cause problems on 64-bit systems due to a bit of luck on
      how NetLabel/CIPSO works and the default ebitmap size, but it can be
      a problem on 32-bit systems.
      
      This patch fixes this problem by being a bit more intelligent when
      importing NetLabel category bitmaps by skipping over empty sections
      which should result in a nice, efficient ebitmap.
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d680e03
    • Filipe Manana's avatar
      Btrfs: fix file corruption after cloning inline extents · df7c9ca8
      Filipe Manana authored
      commit ed958762 upstream.
      
      Using the clone ioctl (or extent_same ioctl, which calls the same extent
      cloning function as well) we end up allowing copy an inline extent from
      the source file into a non-zero offset of the destination file. This is
      something not expected and that the btrfs code is not prepared to deal
      with - all inline extents must be at a file offset equals to 0.
      
      For example, the following excerpt of a test case for fstests triggers
      a crash/BUG_ON() on a write operation after an inline extent is cloned
      into a non-zero offset:
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test files. File foo has the same 2K of data at offset 4K
        # as file bar has at its offset 0.
        $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
            -c "pwrite -S 0xbb 4k 2K" \
            -c "pwrite -S 0xcc 8K 4K" \
            $SCRATCH_MNT/foo | _filter_xfs_io
      
        # File bar consists of a single inline extent (2K size).
        $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
           $SCRATCH_MNT/bar | _filter_xfs_io
      
        # Now call the clone ioctl to clone the extent of file bar into file
        # foo at its offset 4K. This made file foo have an inline extent at
        # offset 4K, something which the btrfs code can not deal with in future
        # IO operations because all inline extents are supposed to start at an
        # offset of 0, resulting in all sorts of chaos.
        # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
        # what it returns for other cases dealing with inlined extents.
        $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
            $SCRATCH_MNT/bar $SCRATCH_MNT/foo
      
        # Because of the inline extent at offset 4K, the following write made
        # the kernel crash with a BUG_ON().
        $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
      
        status=0
        exit
      
      The stack trace of the BUG_ON() triggered by the last write is:
      
        [152154.035903] ------------[ cut here ]------------
        [152154.036424] kernel BUG at mm/page-writeback.c:2286!
        [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
        [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
        [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
        [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
        [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
        [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
        [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
        [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
        [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
        [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
        [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
        [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
        [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
        [152154.036424] Stack:
        [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
        [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
        [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
        [152154.036424] Call Trace:
        [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
        [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
        [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
        [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
        [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
        [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
        [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
        [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
        [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
        [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
        [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
        [152154.036424]  RSP <ffff880429effc68>
        [152154.242621] ---[ end trace e3d3376b23a57041 ]---
      
      Fix this by returning the error EOPNOTSUPP if an attempt to copy an
      inline extent into a non-zero offset happens, just like what is done for
      other scenarios that would require copying/splitting inline extents,
      which were introduced by the following commits:
      
         00fdf13a ("Btrfs: fix a crash of clone with inline extents's split")
         3f9e3df8 ("btrfs: replace error code from btrfs_drop_extents")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df7c9ca8
    • Filipe Manana's avatar
      Btrfs: fix list transaction->pending_ordered corruption · 98f7bfe6
      Filipe Manana authored
      commit d3efe084 upstream.
      
      When we call btrfs_commit_transaction(), we splice the list "ordered"
      of our transaction handle into the transaction's "pending_ordered"
      list, but we don't re-initialize the "ordered" list of our transaction
      handle, this means it still points to the same elements it used to
      before the splice. Then we check if the current transaction's state is
      >= TRANS_STATE_COMMIT_START and if it is we end up calling
      btrfs_end_transaction() which simply splices again the "ordered" list
      of our handle into the transaction's "pending_ordered" list, leaving
      multiple pointers to the same ordered extents which results in list
      corruption when we are iterating, removing and freeing ordered extents
      at btrfs_wait_pending_ordered(), resulting in access to dangling
      pointers / use-after-free issues.
      Similarly, btrfs_end_transaction() can end up in some cases calling
      btrfs_commit_transaction(), and both did a list splice of the transaction
      handle's "ordered" list into the transaction's "pending_ordered" without
      re-initializing the handle's "ordered" list, resulting in exactly the
      same problem.
      
      This produces the following warning on a kernel with linked list
      debugging enabled:
      
      [109749.265416] ------------[ cut here ]------------
      [109749.266410] WARNING: CPU: 7 PID: 324 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
      [109749.267969] list_del corruption. prev->next should be ffff8800ba087e20, but was fffffff8c1f7c35d
      (...)
      [109749.287505] Call Trace:
      [109749.288135]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
      [109749.298080]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
      [109749.331605]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
      [109749.334849]  [<ffffffff81260642>] ? __list_del_entry+0x5a/0x98
      [109749.337093]  [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
      [109749.337847]  [<ffffffff81260642>] __list_del_entry+0x5a/0x98
      [109749.338678]  [<ffffffffa053e8bf>] btrfs_wait_pending_ordered+0x46/0xdb [btrfs]
      [109749.340145]  [<ffffffffa058a65f>] ? __btrfs_run_delayed_items+0x149/0x163 [btrfs]
      [109749.348313]  [<ffffffffa054077d>] btrfs_commit_transaction+0x36b/0xa10 [btrfs]
      [109749.349745]  [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
      [109749.350819]  [<ffffffffa055370d>] btrfs_sync_file+0x36f/0x3fc [btrfs]
      [109749.351976]  [<ffffffff8118ec98>] vfs_fsync_range+0x8f/0x9e
      [109749.360341]  [<ffffffff8118ecc3>] vfs_fsync+0x1c/0x1e
      [109749.368828]  [<ffffffff8118ee1d>] do_fsync+0x34/0x4e
      [109749.369790]  [<ffffffff8118f045>] SyS_fsync+0x10/0x14
      [109749.370925]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
      [109749.382274] ---[ end trace 48e0d07f7c03d95a ]---
      
      On a non-debug kernel this leads to invalid memory accesses, causing a
      crash. Fix this by using list_splice_init() instead of list_splice() in
      btrfs_commit_transaction() and btrfs_end_transaction().
      
      Fixes: 50d9aa99 ("Btrfs: make sure logged extents complete in the current transaction V3"
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98f7bfe6
    • Filipe Manana's avatar
      Btrfs: fix memory leak in the extent_same ioctl · 992a3fbb
      Filipe Manana authored
      commit 497b4050 upstream.
      
      We were allocating memory with memdup_user() but we were never releasing
      that memory. This affected pretty much every call to the ioctl, whether
      it deduplicated extents or not.
      
      This issue was reported on IRC by Julian Taylor and on the mailing list
      by Marcel Ritter, credit goes to them for finding the issue.
      Reported-by: default avatarJulian Taylor <jtaylor.debian@googlemail.com>
      Reported-by: default avatarMarcel Ritter <ritter.marcel@gmail.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      992a3fbb
    • Filipe Manana's avatar
      Btrfs: fix fsync data loss after append write · 544f8fbe
      Filipe Manana authored
      commit e4545de5 upstream.
      
      If we do an append write to a file (which increases its inode's i_size)
      that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
      and the previous transaction added a new hard link to the file, which sets
      the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
      the file, the inode's new i_size isn't logged. This has the consequence
      that after the fsync log is replayed, the file size remains what it was
      before the append write operation, which means users/applications will
      not be able to read the data that was successsfully fsync'ed before.
      
      This happens because neither the inode item nor the delayed inode get
      their i_size updated when the append write is made - doing so would
      require starting a transaction in the buffered write path, something that
      we do not do intentionally for performance reasons.
      
      Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
      set the inode is logged with its current i_size (log the in-memory inode
      into the log tree).
      
      This issue is not a recent regression and is easy to reproduce with the
      following test case for fstests:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        here=`pwd`
        tmp=/tmp/$$
        status=1	# failure is the default!
      
        _cleanup()
        {
                _cleanup_flakey
                rm -f $tmp.*
        }
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _supported_fs generic
        _supported_os Linux
        _need_to_be_root
        _require_scratch
        _require_dm_flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        _crash_and_mount()
        {
                # Simulate a crash/power loss.
                _load_flakey_table $FLAKEY_DROP_WRITES
                _unmount_flakey
                # Allow writes again and mount. This makes the fs replay its fsync log.
                _load_flakey_table $FLAKEY_ALLOW_WRITES
                _mount_flakey
        }
      
        rm -f $seqres.full
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create the test file with some initial data and then fsync it.
        # The fsync here is only needed to trigger the issue in btrfs, as it causes the
        # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                        -c "fsync" \
                        $SCRATCH_MNT/foo | _filter_xfs_io
        sync
      
        # Add a hard link to our file.
        # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
        # which is a necessary condition to trigger the issue.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar
      
        # Sync the filesystem to force a commit of the current btrfs transaction, this
        # is a necessary condition to trigger the bug on btrfs.
        sync
      
        # Now append more data to our file, increasing its size, and fsync the file.
        # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
        # write path did not update the inode item in the btree nor the delayed inode
        # item (in memory struture) in the current transaction (created by the fsync
        # handler), the fsync did not record the inode's new i_size in the fsync
        # log/journal. This made the data unavailable after the fsync log/journal is
        # replayed.
        $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
                     -c "fsync" \
                     $SCRATCH_MNT/foo | _filter_xfs_io
      
        echo "File content after fsync and before crash:"
        od -t x1 $SCRATCH_MNT/foo
      
        _crash_and_mount
      
        echo "File content after crash and log replay:"
        od -t x1 $SCRATCH_MNT/foo
      
        status=0
        exit
      
      The expected file output before and after the crash/power failure expects the
      appended data to be available, which is:
      
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
        *
        0200000
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      544f8fbe
    • Filipe Manana's avatar
      Btrfs: fix race between caching kthread and returning inode to inode cache · 9547e86b
      Filipe Manana authored
      commit ae9d8f17 upstream.
      
      While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
      we could have a concurrent call to btrfs_return_ino() that adds a new
      entry to the root's free space cache of pinned inodes. This concurrent
      call does not acquire the fs_info->commit_root_sem before adding a new
      entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
      because the caching kthread calls btrfs_unpin_free_ino() after setting
      the caching state to BTRFS_CACHE_FINISHED and therefore races with
      the task calling btrfs_return_ino(), which is adding a new entry, while
      the former (caching kthread) is navigating the cache's rbtree, removing
      and freeing nodes from the cache's rbtree without acquiring the spinlock
      that protects the rbtree.
      
      This race resulted in memory corruption due to double free of struct
      btrfs_free_space objects because both tasks can end up doing freeing the
      same objects. Note that adding a new entry can result in merging it with
      other entries in the cache, in which case those entries are freed.
      This is particularly important as btrfs_free_space structures are also
      used for the block group free space caches.
      
      This memory corruption can be detected by a debugging kernel, which
      reports it with the following trace:
      
      [132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
      [132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
      [132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
      [132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
      [132408.505075] Call Trace:
      [132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
      [132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
      [132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
      [132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
      [132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
      [132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
      [132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
      [132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
      [132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
      [132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
      [132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
      [132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
      [132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
      [132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
      [132409.503355] ------------[ cut here ]------------
      [132409.504241] kernel BUG at mm/slab.c:2571!
      
      Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
      that protects the rbtree while doing the searches and removing entries.
      
      Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9547e86b
    • Filipe Manana's avatar
      Btrfs: use kmem_cache_free when freeing entry in inode cache · 6f953ad8
      Filipe Manana authored
      commit c3f4a168 upstream.
      
      The free space entries are allocated using kmem_cache_zalloc(),
      through __btrfs_add_free_space(), therefore we should use
      kmem_cache_free() and not kfree() to avoid any confusion and
      any potential problem. Looking at the kfree() definition at
      mm/slab.c it has the following comment:
      
        /*
         * (...)
         *
         * Don't free memory not originally allocated by kmalloc()
         * or you will run into trouble.
         */
      
      So better be safe and use kmem_cache_free().
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f953ad8
    • Firo Yang's avatar
      md: fix a build warning · 528feaea
      Firo Yang authored
      commit 4e023612 upstream.
      
      Warning like this:
      
      drivers/md/md.c: In function "update_array_info":
      drivers/md/md.c:6394:26: warning: logical not is only applied
      to the left hand side of comparison [-Wlogical-not-parentheses]
            !mddev->persistent  != info->not_persistent||
      
      Fix it as Neil Brown said:
      mddev->persistent != !info->not_persistent ||
      Signed-off-by: default avatarFiro Yang <firogm@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      528feaea
    • Omar Sandoval's avatar
      Btrfs: don't invalidate root dentry when subvolume deletion fails · 54b1fb57
      Omar Sandoval authored
      commit 64ad6c48 upstream.
      
      Since commit bafc9b75 ("vfs: More precise tests in d_invalidate"),
      mounted subvolumes can be deleted because d_invalidate() won't fail.
      However, we run into problems when we attempt to delete the default
      subvolume while it is mounted as the root filesystem:
      
      	# btrfs subvol list /
      	ID 257 gen 306 top level 5 path rootvol
      	ID 267 gen 334 top level 5 path snap1
      	# btrfs subvol get-default /
      	ID 267 gen 334 top level 5 path snap1
      	# btrfs inspect-internal rootid /
      	267
      	# mount -o subvol=/ /dev/vda1 /mnt
      	# btrfs subvol del /mnt/snap1
      	Delete subvolume (no-commit): '/mnt/snap1'
      	ERROR: cannot delete '/mnt/snap1' - Operation not permitted
      	# findmnt /
      	findmnt: can't read /proc/mounts: No such file or directory
      	# ls /proc
      	#
      
      Markus reported that this same scenario simply led to a kernel oops.
      
      This happens because in btrfs_ioctl_snap_destroy(), we call
      d_invalidate() before we check may_destroy_subvol(), which means that we
      detach the submounts and drop the dentry before erroring out. Instead,
      we should only invalidate the dentry once the deletion has succeeded.
      Additionally, the shrink_dcache_sb() isn't necessary; d_invalidate()
      will prune the dcache for the deleted subvolume.
      
      Fixes: bafc9b75 ("vfs: More precise tests in d_invalidate")
      Reported-by: default avatarMarkus Schauler <mschauler@gmail.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54b1fb57
    • Stefan Wahren's avatar
      ARM: dts: mx23: fix iio-hwmon support · 83719f40
      Stefan Wahren authored
      commit e8e94ed6 upstream.
      
      In order to get iio-hwmon support, the lradc must be declared as an
      iio provider. So fix this issue by adding the #io-channel-cells property.
      Signed-off-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Fixes: bd798f9c ("ARM: dts: mxs: Add iio-hwmon to mx23 soc")
      Reviewed-by: default avatarMarek Vasut <marex@denx.de>
      Reviewed-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83719f40
    • Constantine Shulyupin's avatar
      hwmon: (nct7802) fix visibility of temp3 · 2618fae8
      Constantine Shulyupin authored
      commit 56172d81 upstream.
      
      Excerpt from datasheet:
      7.2.32 Mode Selection Register
      RTD3_MD : 00=Closed , 01=Reserved , 10=Thermistor mode , 11=Voltage sense
      
      Show temp3 only in Thermistor mode
      Signed-off-by: default avatarConstantine Shulyupin <const@MakeLinux.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2618fae8
    • Stevens, Nick's avatar
      hwmon: (mcp3021) Fix broken output scaling · f9440235
      Stevens, Nick authored
      commit 347d7e45 upstream.
      
      The mcp3021 scaling code is dividing the VDD (full-scale) value in
      millivolts by the A2D resolution to obtain the scaling factor. When VDD
      is 3300mV (the standard value) and the resolution is 12-bit (4096
      divisions), the result is a scale factor of 3300/4096, which is always
      one.  Effectively, the raw A2D reading is always being returned because
      no scaling is applied.
      
      This patch fixes the issue and simplifies the register-to-volts
      calculation, removing the unneeded "output_scale" struct member.
      Signed-off-by: default avatarNick Stevens <Nick.Stevens@digi.com>
      [Guenter Roeck: Dropped unnecessary value check]
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9440235
    • Goldwyn Rodrigues's avatar
      md: Skip cluster setup for dm-raid · 7640ca52
      Goldwyn Rodrigues authored
      commit d3b178ad upstream.
      
      There is a bug that the bitmap superblock isn't initialised properly for
      dm-raid, so a new field can have garbage in new fields.
      (dm-raid does initialisation in the kernel - md initialised the
       superblock in mdadm).
      
      This means that for dm-raid we cannot currently trust the new ->nodes
      field. So:
       - use __GFP_ZERO to initialise the superblock properly for all new
          arrays
       - initialise all fields in bitmap_info in bitmap_new_disk_sb
       - ignore ->nodes for dm arrays (yes, this is a hack)
      
      This bug exposes dm-raid to bug in the (still experimental) md-cluster
      code, so it is suitable for -stable.  It does cause crashes.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=100491Signed-off-By: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7640ca52
    • NeilBrown's avatar
      md: unlock mddev_lock on an error path. · 0f9457af
      NeilBrown authored
      commit 9a8c0fa8 upstream.
      
      This error path retuns while still holding the lock - bad.
      
      Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f9457af
    • NeilBrown's avatar
      md: clear mddev->private when it has been freed. · adeb846a
      NeilBrown authored
      commit bd691922 upstream.
      
      If ->private is set when ->run is called, it is assumed to be
      a 'config'  prepared as part of 'reshape'.
      
      So it is important when we free that config, that we also clear ->private.
      This is not often a problem as the mddev will normally be discarded
      shortly after the config us freed.
      However if an 'assemble' races with a final close, the assemble can use
      the old mddev which has a stale ->private.  This leads to any of
      various sorts of crashes.
      
      So clear ->private after calling ->free().
      Reported-by: default avatarNate Clark <nate@neworld.us>
      Fixes: afa0f557 ("md: rename ->stop to ->free")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adeb846a
    • Lior Amsalem's avatar
      dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup · 499b1532
      Lior Amsalem authored
      commit 9136291f upstream.
      
      This patch fixes a bug in the XOR driver where the cleanup function can be
      called and free descriptors that never been processed by the engine (which
      result in data errors).
      
      The cleanup function will free descriptors based on the ownership bit in
      the descriptors.
      
      Fixes: ff7b0479 ("dmaengine: DMA engine driver for Marvell XOR engine")
      Signed-off-by: default avatarLior Amsalem <alior@marvell.com>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Reviewed-by: default avatarOfer Heifetz <oferh@marvell.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      499b1532
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix sample output of dynamic arrays · 63544f7d
      Steven Rostedt (Red Hat) authored
      commit d6726c81 upstream.
      
      He Kuang noticed that the trace event samples for arrays was broken:
      
      "The output result of trace_foo_bar event in traceevent samples is
       wrong. This problem can be reproduced as following:
      
        (Build kernel with SAMPLE_TRACE_EVENTS=m)
      
        $ insmod trace-events-sample.ko
      
        $ echo 1 > /sys/kernel/debug/tracing/events/sample-trace/foo_bar/enable
      
        $ cat /sys/kernel/debug/tracing/trace
      
        event-sample-980 [000] ....  43.649559: foo_bar: foo hello 21 0x15
        BIT1|BIT3|0x10 {0x1,0x6f6f6e53,0xff007970,0xffffffff} Snoopy
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       The array length is not right, should be {0x1}.
        (ffffffff,ffffffff)
      
        event-sample-980 [000] ....  44.653827: foo_bar: foo hello 22 0x16
        BIT2|BIT3|0x10
        {0x1,0x2,0x646e6147,0x666c61,0xffffffff,0xffffffff,0x750aeffe,0x7}
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       The array length is not right, should be {0x1,0x2}.
        Gandalf (ffffffff,ffffffff)"
      
      This was caused by an update to have __print_array()'s second parameter
      be the count of items in the array and not the size of the array.
      
      As there is already users of __print_array(), it can not change. But
      the sample code can and we can also improve on the documentation about
      __print_array() and __get_dynamic_array_len().
      
      Link: http://lkml.kernel.org/r/1436839171-31527-2-git-send-email-hekuang@huawei.com
      
      Fixes: ac01ce14 ("tracing: Make ftrace_print_array_seq compute buf_len")
      Reported-by: default avatarHe Kuang <hekuang@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63544f7d
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Have branch tracer use recursive field of task struct · 624dda42
      Steven Rostedt (Red Hat) authored
      commit 6224beb1 upstream.
      
      Fengguang Wu's tests triggered a bug in the branch tracer's start up
      test when CONFIG_DEBUG_PREEMPT set. This was because that config
      adds some debug logic in the per cpu field, which calls back into
      the branch tracer.
      
      The branch tracer has its own recursive checks, but uses a per cpu
      variable to implement it. If retrieving the per cpu variable calls
      back into the branch tracer, you can see how things will break.
      
      Instead of using a per cpu variable, use the trace_recursion field
      of the current task struct. Simply set a bit when entering the
      branch tracing and clear it when leaving. If the bit is set on
      entry, just don't do the tracing.
      
      There's also the case with lockdep, as the local_irq_save() called
      before the recursion can also trigger code that can call back into
      the function. Changing that to a raw_local_irq_save() will protect
      that as well.
      
      This prevents the recursion and the inevitable crash that follows.
      
      Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.comReported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      624dda42
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix typo from "static inlin" to "static inline" · 2161c867
      Steven Rostedt (Red Hat) authored
      commit cc9e4bde upstream.
      
      The trace.h header when called without CONFIG_EVENT_TRACING enabled
      (seldom done), will not compile because of a typo in the protocol
      of trace_event_enum_update().
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2161c867
    • Steven Rostedt (Red Hat)'s avatar
      tracing/filter: Do not allow infix to exceed end of string · a27274be
      Steven Rostedt (Red Hat) authored
      commit 6b88f44e upstream.
      
      While debugging a WARN_ON() for filtering, I found that it is possible
      for the filter string to be referenced after its end. With the filter:
      
       # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
      
      The filter_parse() function can call infix_get_op() which calls
      infix_advance() that updates the infix filter pointers for the cnt
      and tail without checking if the filter is already at the end, which
      will put the cnt to zero and the tail beyond the end. The loop then calls
      infix_next() that has
      
      	ps->infix.cnt--;
      	return ps->infix.string[ps->infix.tail++];
      
      The cnt will now be below zero, and the tail that is returned is
      already passed the end of the filter string. So far the allocation
      of the filter string usually has some buffer that is zeroed out, but
      if the filter string is of the exact size of the allocated buffer
      there's no guarantee that the charater after the nul terminating
      character will be zero.
      
      Luckily, only root can write to the filter.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a27274be