1. 03 Aug, 2015 40 commits
    • Jiang Liu's avatar
      ACPI / PCI: Fix regressions caused by resource_size_t overflow with 32-bit kernel · 1d7a398b
      Jiang Liu authored
      commit 1fb01ca9 upstream.
      
      Zoltan Boszormenyi reported this regression:
        "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
         1565:230e) network chip on the mainboard. After the r8169 driver loaded
         the IRQs in the machine went berserk. Keyboard keypressed arrived with
         considerable latency and duplicated, so no real work was possible.
         The machine responded to the power button but didn't actually power
         down. It just stuck at the powering down message. I had to press the
         power button for 4 seconds to power it down.
      
         The computer is a POS machine with a big battery inside. Because of this,
         either ACPI or the Realtek chip kept the bad state and after rebooting,
         the network chip didn't even show up in lspci. Not even the PXE ROM
         announced itself during boot. I had to disconnect the battery to beat
         some sense back to the computer.
      
         The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
         good."
      
      The regression is caused by commit 593669c2 (x86/PCI/ACPI: Use common
      ACPI resource interfaces to simplify implementation). Since commit
      593669c2, x86 PCI ACPI host bridge driver validates ACPI resources by
      first converting an ACPI resource to a 'struct resource' structure and
      then applying checks against the converted resource structure. The 'start'
      and 'end' fields in 'struct resource' are defined to be type of
      resource_size_t, which may be 32 bits or 64 bits depending on
      CONFIG_PHYS_ADDR_T_64BIT.
      
      This may cause incorrect resource validation results with 32-bit kernels
      because 64-bit ACPI resource descriptors may get truncated when converting
      to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
      affects PCI resource allocation subsystem and makes some PCI devices and
      the system behave abnormally due to incorrect resource assignment.
      
      So enhance the ACPI resource parsing interfaces to ignore ACPI resource
      descriptors with address/offset above 4G when running in 32-bit mode.
      
      With the fix applied, the behavior of the machine was restored to how
      3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
      and lspci -vvxxx shows that everything is at the same memory window as
      they were with 3.18.16.
      Reported-and-tested-by: default avatarBoszormenyi Zoltan <zboszor@pr.hu>
      Fixes: 593669c2 (x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation)
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d7a398b
    • Lv Zheng's avatar
      ACPICA: Tables: Enable default 64-bit FADT addresses favor · 24b2b68e
      Lv Zheng authored
      commit 0ea61381 upstream.
      
      ACPICA commit 4da56eeae0749dfe8491285c1e1fad48f6efafd8
      
      The following commit temporarily disables correct 64-bit FADT addresses
      favor during the period the root cause of the bug is not fixed:
       Commit: 85dbd580
       ACPICA: Tables: Restore old behavor to favor 32-bit FADT addresses.
      
      With enough protections, this patch re-enables 64-bit FADT addresses by
      default. If regressions are reported against such change, this patch should
      be bisected and reverted.
      Note that 64-bit FACS favor and 64-bit firmware waking vector favor are
      excluded by this commit in order not to break OSPMs. Lv Zheng.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021
      Link: https://github.com/acpica/acpica/commit/4da56eeaReported-and-tested-by: default avatarOswald Buddenhagen <ossi@kde.org>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24b2b68e
    • Lv Zheng's avatar
      ACPICA: Tables: Fix an issue that FACS initialization is performed twice · c0f23125
      Lv Zheng authored
      commit c04be184 upstream.
      
      ACPICA commit 90f5332a15e9d9ba83831ca700b2b9f708274658
      
      This patch adds a new FACS initialization flag for acpi_tb_initialize().
      acpi_enable_subsystem() might be invoked several times in OS bootup process,
      and we don't want FACS initialization to be invoked twice. Lv Zheng.
      
      Link: https://github.com/acpica/acpica/commit/90f5332aSigned-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0f23125
    • Lv Zheng's avatar
      ACPICA: Tables: Enable both 32-bit and 64-bit FACS · b1bce17e
      Lv Zheng authored
      commit c04e1fb4 upstream.
      
      ACPICA commit f7b86f35416e3d1f71c3d816ff5075ddd33ed486
      
      The following commit is reported to have broken s2ram on some platforms:
       Commit: 0249ed24
       ACPICA: Add option to favor 32-bit FADT addresses.
      The platform reports 2 FACS tables (which is not allowed by ACPI
      specification) and the new 32-bit address favor rule forces OSPMs to use
      the FACS table reported via FADT's X_FIRMWARE_CTRL field.
      
      The root cause of the reported bug might be one of the followings:
      1. BIOS may favor the 64-bit firmware waking vector address when the
         version of the FACS is greater than 0 and Linux currently only supports
         resuming from the real mode, so the 64-bit firmware waking vector has
         never been set and might be invalid to BIOS while the commit enables
         higher version FACS.
      2. BIOS may favor the FACS reported via the "FIRMWARE_CTRL" field in the
         FADT while the commit doesn't set the firmware waking vector address of
         the FACS reported by "FIRMWARE_CTRL", it only sets the firware waking
         vector address of the FACS reported by "X_FIRMWARE_CTRL".
      
      This patch excludes the cases that can trigger the bugs caused by the root
      cause 2.
      
      There is no handshaking mechanism can be used by OSPM to tell BIOS which
      FACS is currently used. Thus the FACS reported by "FIRMWARE_CTRL" may still
      be used by BIOS and the 0 value of the 32-bit firmware waking vector might
      trigger such failure.
      
      This patch tries to favor 32bit FACS address in another way where both the
      FACS reported by "FIRMWARE_CTRL" and the FACS reported by "X_FIRMWARE_CTRL"
      are loaded so that further commit can set firmware waking vector in the
      both tables to ensure we can exclude the cases that trigger the bugs caused
      by the root cause 2. The exclusion is split into 2 commits as this commit
      is also useful for dumping more ACPI tables, it won't get reverted when
      such exclusion is no longer necessary. Lv Zheng.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=74021
      Link: https://github.com/acpica/acpica/commit/f7b86f35Reported-and-tested-by: default avatarOswald Buddenhagen <ossi@kde.org>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1bce17e
    • Rafael J. Wysocki's avatar
      ACPI / LPSS: Fix up acpi_lpss_create_device() · af3cc772
      Rafael J. Wysocki authored
      commit d3e13ff3 upstream.
      
      Fix a return value (which should be a negative error code) and a
      memory leak (the list allocated by acpi_dev_get_resources() needs
      to be freed on ioremap() errors too) in acpi_lpss_create_device()
      introduced by commit 4483d59e 'ACPI / LPSS: check the result
      of ioremap()'.
      
      Fixes: 4483d59e 'ACPI / LPSS: check the result of ioremap()'
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3cc772
    • Rafael J. Wysocki's avatar
      ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage · 3dfbf877
      Rafael J. Wysocki authored
      commit 0294112e upstream.
      
      This effectively reverts the following three commits:
      
       7bc10388 ACPI / resources: free memory on error in add_region_before()
       0f1b414d ACPI / PNP: Avoid conflicting resource reservations
       b9a5e5e1 ACPI / init: Fix the ordering of acpi_reserve_resources()
      
      (commit b9a5e5e1 introduced regressions some of which, but not
      all, were addressed by commit 0f1b414d and commit 7bc10388
      was a fixup on top of the latter) and causes ACPI fixed hardware
      resources to be reserved at the fs_initcall_sync stage of system
      initialization.
      
      The story is as follows.  First, a boot regression was reported due
      to an apparent resource reservation ordering change after a commit
      that shouldn't lead to such changes.  Investigation led to the
      conclusion that the problem happened because acpi_reserve_resources()
      was executed at the device_initcall() stage of system initialization
      which wasn't strictly ordered with respect to driver initialization
      (and with respect to the initialization of the pcieport driver in
      particular), so a random change causing the device initcalls to be
      run in a different order might break things.
      
      The response to that was to attempt to run acpi_reserve_resources()
      as soon as we knew that ACPI would be in use (commit b9a5e5e1).
      However, that turned out to be too early, because it caused resource
      reservations made by the PNP system driver to fail on at least one
      system and that failure was addressed by commit 0f1b414d.
      
      That fix still turned out to be insufficient, though, because
      calling acpi_reserve_resources() before the fs_initcall stage of
      system initialization caused a boot regression to happen on the
      eCAFE EC-800-H20G/S netbook.  That meant that we only could call
      acpi_reserve_resources() at the fs_initcall initialization stage
      or later, but then we might just as well call it after the PNP
      initalization in which case commit 0f1b414d wouldn't be
      necessary any more.
      
      For this reason, the changes made by commit 0f1b414d are reverted
      (along with a memory leak fixup on top of that commit), the changes
      made by commit b9a5e5e1 that went too far are reverted too and
      acpi_reserve_resources() is changed into fs_initcall_sync, which
      will cause it to be executed after the PNP subsystem initialization
      (which is an fs_initcall) and before device initcalls (including
      the pcieport driver initialization) which should avoid the initial
      issue.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
      Link: http://marc.info/?t=143092384600002&r=1&w=2
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
      Link: http://marc.info/?t=143389402600001&r=1&w=2
      Fixes: b9a5e5e1 "ACPI / init: Fix the ordering of acpi_reserve_resources()"
      Reported-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dfbf877
    • Dan Carpenter's avatar
      ACPI / resources: free memory on error in add_region_before() · 2dfdaa26
      Dan Carpenter authored
      commit 7bc10388 upstream.
      
      There is a small memory leak on error.
      
      Fixes: 0f1b414d (ACPI / PNP: Avoid conflicting resource reservations)
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2dfdaa26
    • Ilya Dryomov's avatar
      crush: fix a bug in tree bucket decode · 94fc3084
      Ilya Dryomov authored
      commit 82cd003a upstream.
      
      struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
      should be used.  -Wconversion catches this, but I guess it went
      unnoticed in all the noise it spews.  The actual problem (at least for
      common crushmaps) isn't the u32 -> u8 truncation though - it's the
      advancement by 4 bytes instead of 1 in the crushmap buffer.
      
      Fixes: http://tracker.ceph.com/issues/2759Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94fc3084
    • Miklos Szeredi's avatar
      fuse: initialize fc->release before calling it · 650b07ba
      Miklos Szeredi authored
      commit 0ad0b325 upstream.
      
      fc->release is called from fuse_conn_put() which was used in the error
      cleanup before fc->release was initialized.
      
      [Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
      fuse_conn_init(fc) instead of before.]
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Fixes: a325f9b9 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      650b07ba
    • Stephen Smalley's avatar
      selinux: fix mprotect PROT_EXEC regression caused by mm change · 872d2790
      Stephen Smalley authored
      commit 892e8cac upstream.
      
      commit 66fc1303 ("mm: shmem_zero_setup
      skip security check and lockdep conflict with XFS") caused a regression
      for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
      shared anonymous mappings.  However, even before that regression, the
      checking on such mprotect PROT_EXEC calls was inconsistent with the
      checking on a mmap PROT_EXEC call for a shared anonymous mapping.  On a
      mmap, the security hook is passed a NULL file and knows it is dealing
      with an anonymous mapping and therefore applies an execmem check and no
      file checks.  On a mprotect, the security hook is passed a vma with a
      non-NULL vm_file (as this was set from the internally-created shmem
      file during mmap) and therefore applies the file-based execute check
      and no execmem check.  Since the aforementioned commit now marks the
      shmem zero inode with the S_PRIVATE flag, the file checks are disabled
      and we have no checking at all on mprotect PROT_EXEC.  Add a test to
      the mprotect hook logic for such private inodes, and apply an execmem
      check in that case.  This makes the mmap and mprotect checking
      consistent for shared anonymous mappings, as well as for /dev/zero and
      ashmem.
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      872d2790
    • Paul Moore's avatar
      selinux: don't waste ebitmap space when importing NetLabel categories · 9d680e03
      Paul Moore authored
      commit 33246035 upstream.
      
      At present we don't create efficient ebitmaps when importing NetLabel
      category bitmaps.  This can present a problem when comparing ebitmaps
      since ebitmap_cmp() is very strict about these things and considers
      these wasteful ebitmaps not equal when compared to their more
      efficient counterparts, even if their values are the same.  This isn't
      likely to cause problems on 64-bit systems due to a bit of luck on
      how NetLabel/CIPSO works and the default ebitmap size, but it can be
      a problem on 32-bit systems.
      
      This patch fixes this problem by being a bit more intelligent when
      importing NetLabel category bitmaps by skipping over empty sections
      which should result in a nice, efficient ebitmap.
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d680e03
    • Filipe Manana's avatar
      Btrfs: fix file corruption after cloning inline extents · df7c9ca8
      Filipe Manana authored
      commit ed958762 upstream.
      
      Using the clone ioctl (or extent_same ioctl, which calls the same extent
      cloning function as well) we end up allowing copy an inline extent from
      the source file into a non-zero offset of the destination file. This is
      something not expected and that the btrfs code is not prepared to deal
      with - all inline extents must be at a file offset equals to 0.
      
      For example, the following excerpt of a test case for fstests triggers
      a crash/BUG_ON() on a write operation after an inline extent is cloned
      into a non-zero offset:
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test files. File foo has the same 2K of data at offset 4K
        # as file bar has at its offset 0.
        $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
            -c "pwrite -S 0xbb 4k 2K" \
            -c "pwrite -S 0xcc 8K 4K" \
            $SCRATCH_MNT/foo | _filter_xfs_io
      
        # File bar consists of a single inline extent (2K size).
        $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
           $SCRATCH_MNT/bar | _filter_xfs_io
      
        # Now call the clone ioctl to clone the extent of file bar into file
        # foo at its offset 4K. This made file foo have an inline extent at
        # offset 4K, something which the btrfs code can not deal with in future
        # IO operations because all inline extents are supposed to start at an
        # offset of 0, resulting in all sorts of chaos.
        # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
        # what it returns for other cases dealing with inlined extents.
        $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
            $SCRATCH_MNT/bar $SCRATCH_MNT/foo
      
        # Because of the inline extent at offset 4K, the following write made
        # the kernel crash with a BUG_ON().
        $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
      
        status=0
        exit
      
      The stack trace of the BUG_ON() triggered by the last write is:
      
        [152154.035903] ------------[ cut here ]------------
        [152154.036424] kernel BUG at mm/page-writeback.c:2286!
        [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
        [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
        [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
        [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
        [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
        [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
        [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
        [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
        [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
        [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
        [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
        [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
        [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
        [152154.036424] Stack:
        [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
        [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
        [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
        [152154.036424] Call Trace:
        [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
        [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
        [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
        [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
        [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
        [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
        [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
        [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
        [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
        [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
        [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
        [152154.036424]  RSP <ffff880429effc68>
        [152154.242621] ---[ end trace e3d3376b23a57041 ]---
      
      Fix this by returning the error EOPNOTSUPP if an attempt to copy an
      inline extent into a non-zero offset happens, just like what is done for
      other scenarios that would require copying/splitting inline extents,
      which were introduced by the following commits:
      
         00fdf13a ("Btrfs: fix a crash of clone with inline extents's split")
         3f9e3df8 ("btrfs: replace error code from btrfs_drop_extents")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df7c9ca8
    • Filipe Manana's avatar
      Btrfs: fix list transaction->pending_ordered corruption · 98f7bfe6
      Filipe Manana authored
      commit d3efe084 upstream.
      
      When we call btrfs_commit_transaction(), we splice the list "ordered"
      of our transaction handle into the transaction's "pending_ordered"
      list, but we don't re-initialize the "ordered" list of our transaction
      handle, this means it still points to the same elements it used to
      before the splice. Then we check if the current transaction's state is
      >= TRANS_STATE_COMMIT_START and if it is we end up calling
      btrfs_end_transaction() which simply splices again the "ordered" list
      of our handle into the transaction's "pending_ordered" list, leaving
      multiple pointers to the same ordered extents which results in list
      corruption when we are iterating, removing and freeing ordered extents
      at btrfs_wait_pending_ordered(), resulting in access to dangling
      pointers / use-after-free issues.
      Similarly, btrfs_end_transaction() can end up in some cases calling
      btrfs_commit_transaction(), and both did a list splice of the transaction
      handle's "ordered" list into the transaction's "pending_ordered" without
      re-initializing the handle's "ordered" list, resulting in exactly the
      same problem.
      
      This produces the following warning on a kernel with linked list
      debugging enabled:
      
      [109749.265416] ------------[ cut here ]------------
      [109749.266410] WARNING: CPU: 7 PID: 324 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
      [109749.267969] list_del corruption. prev->next should be ffff8800ba087e20, but was fffffff8c1f7c35d
      (...)
      [109749.287505] Call Trace:
      [109749.288135]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
      [109749.298080]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
      [109749.331605]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
      [109749.334849]  [<ffffffff81260642>] ? __list_del_entry+0x5a/0x98
      [109749.337093]  [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
      [109749.337847]  [<ffffffff81260642>] __list_del_entry+0x5a/0x98
      [109749.338678]  [<ffffffffa053e8bf>] btrfs_wait_pending_ordered+0x46/0xdb [btrfs]
      [109749.340145]  [<ffffffffa058a65f>] ? __btrfs_run_delayed_items+0x149/0x163 [btrfs]
      [109749.348313]  [<ffffffffa054077d>] btrfs_commit_transaction+0x36b/0xa10 [btrfs]
      [109749.349745]  [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
      [109749.350819]  [<ffffffffa055370d>] btrfs_sync_file+0x36f/0x3fc [btrfs]
      [109749.351976]  [<ffffffff8118ec98>] vfs_fsync_range+0x8f/0x9e
      [109749.360341]  [<ffffffff8118ecc3>] vfs_fsync+0x1c/0x1e
      [109749.368828]  [<ffffffff8118ee1d>] do_fsync+0x34/0x4e
      [109749.369790]  [<ffffffff8118f045>] SyS_fsync+0x10/0x14
      [109749.370925]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
      [109749.382274] ---[ end trace 48e0d07f7c03d95a ]---
      
      On a non-debug kernel this leads to invalid memory accesses, causing a
      crash. Fix this by using list_splice_init() instead of list_splice() in
      btrfs_commit_transaction() and btrfs_end_transaction().
      
      Fixes: 50d9aa99 ("Btrfs: make sure logged extents complete in the current transaction V3"
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98f7bfe6
    • Filipe Manana's avatar
      Btrfs: fix memory leak in the extent_same ioctl · 992a3fbb
      Filipe Manana authored
      commit 497b4050 upstream.
      
      We were allocating memory with memdup_user() but we were never releasing
      that memory. This affected pretty much every call to the ioctl, whether
      it deduplicated extents or not.
      
      This issue was reported on IRC by Julian Taylor and on the mailing list
      by Marcel Ritter, credit goes to them for finding the issue.
      Reported-by: default avatarJulian Taylor <jtaylor.debian@googlemail.com>
      Reported-by: default avatarMarcel Ritter <ritter.marcel@gmail.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      992a3fbb
    • Filipe Manana's avatar
      Btrfs: fix fsync data loss after append write · 544f8fbe
      Filipe Manana authored
      commit e4545de5 upstream.
      
      If we do an append write to a file (which increases its inode's i_size)
      that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
      and the previous transaction added a new hard link to the file, which sets
      the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
      the file, the inode's new i_size isn't logged. This has the consequence
      that after the fsync log is replayed, the file size remains what it was
      before the append write operation, which means users/applications will
      not be able to read the data that was successsfully fsync'ed before.
      
      This happens because neither the inode item nor the delayed inode get
      their i_size updated when the append write is made - doing so would
      require starting a transaction in the buffered write path, something that
      we do not do intentionally for performance reasons.
      
      Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
      set the inode is logged with its current i_size (log the in-memory inode
      into the log tree).
      
      This issue is not a recent regression and is easy to reproduce with the
      following test case for fstests:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        here=`pwd`
        tmp=/tmp/$$
        status=1	# failure is the default!
      
        _cleanup()
        {
                _cleanup_flakey
                rm -f $tmp.*
        }
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _supported_fs generic
        _supported_os Linux
        _need_to_be_root
        _require_scratch
        _require_dm_flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        _crash_and_mount()
        {
                # Simulate a crash/power loss.
                _load_flakey_table $FLAKEY_DROP_WRITES
                _unmount_flakey
                # Allow writes again and mount. This makes the fs replay its fsync log.
                _load_flakey_table $FLAKEY_ALLOW_WRITES
                _mount_flakey
        }
      
        rm -f $seqres.full
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create the test file with some initial data and then fsync it.
        # The fsync here is only needed to trigger the issue in btrfs, as it causes the
        # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                        -c "fsync" \
                        $SCRATCH_MNT/foo | _filter_xfs_io
        sync
      
        # Add a hard link to our file.
        # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
        # which is a necessary condition to trigger the issue.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar
      
        # Sync the filesystem to force a commit of the current btrfs transaction, this
        # is a necessary condition to trigger the bug on btrfs.
        sync
      
        # Now append more data to our file, increasing its size, and fsync the file.
        # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
        # write path did not update the inode item in the btree nor the delayed inode
        # item (in memory struture) in the current transaction (created by the fsync
        # handler), the fsync did not record the inode's new i_size in the fsync
        # log/journal. This made the data unavailable after the fsync log/journal is
        # replayed.
        $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
                     -c "fsync" \
                     $SCRATCH_MNT/foo | _filter_xfs_io
      
        echo "File content after fsync and before crash:"
        od -t x1 $SCRATCH_MNT/foo
      
        _crash_and_mount
      
        echo "File content after crash and log replay:"
        od -t x1 $SCRATCH_MNT/foo
      
        status=0
        exit
      
      The expected file output before and after the crash/power failure expects the
      appended data to be available, which is:
      
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
        *
        0200000
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      544f8fbe
    • Filipe Manana's avatar
      Btrfs: fix race between caching kthread and returning inode to inode cache · 9547e86b
      Filipe Manana authored
      commit ae9d8f17 upstream.
      
      While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
      we could have a concurrent call to btrfs_return_ino() that adds a new
      entry to the root's free space cache of pinned inodes. This concurrent
      call does not acquire the fs_info->commit_root_sem before adding a new
      entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
      because the caching kthread calls btrfs_unpin_free_ino() after setting
      the caching state to BTRFS_CACHE_FINISHED and therefore races with
      the task calling btrfs_return_ino(), which is adding a new entry, while
      the former (caching kthread) is navigating the cache's rbtree, removing
      and freeing nodes from the cache's rbtree without acquiring the spinlock
      that protects the rbtree.
      
      This race resulted in memory corruption due to double free of struct
      btrfs_free_space objects because both tasks can end up doing freeing the
      same objects. Note that adding a new entry can result in merging it with
      other entries in the cache, in which case those entries are freed.
      This is particularly important as btrfs_free_space structures are also
      used for the block group free space caches.
      
      This memory corruption can be detected by a debugging kernel, which
      reports it with the following trace:
      
      [132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
      [132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
      [132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
      [132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
      [132408.505075] Call Trace:
      [132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
      [132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
      [132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
      [132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
      [132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
      [132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
      [132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
      [132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
      [132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
      [132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
      [132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
      [132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
      [132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
      [132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
      [132409.503355] ------------[ cut here ]------------
      [132409.504241] kernel BUG at mm/slab.c:2571!
      
      Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
      that protects the rbtree while doing the searches and removing entries.
      
      Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9547e86b
    • Filipe Manana's avatar
      Btrfs: use kmem_cache_free when freeing entry in inode cache · 6f953ad8
      Filipe Manana authored
      commit c3f4a168 upstream.
      
      The free space entries are allocated using kmem_cache_zalloc(),
      through __btrfs_add_free_space(), therefore we should use
      kmem_cache_free() and not kfree() to avoid any confusion and
      any potential problem. Looking at the kfree() definition at
      mm/slab.c it has the following comment:
      
        /*
         * (...)
         *
         * Don't free memory not originally allocated by kmalloc()
         * or you will run into trouble.
         */
      
      So better be safe and use kmem_cache_free().
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f953ad8
    • Firo Yang's avatar
      md: fix a build warning · 528feaea
      Firo Yang authored
      commit 4e023612 upstream.
      
      Warning like this:
      
      drivers/md/md.c: In function "update_array_info":
      drivers/md/md.c:6394:26: warning: logical not is only applied
      to the left hand side of comparison [-Wlogical-not-parentheses]
            !mddev->persistent  != info->not_persistent||
      
      Fix it as Neil Brown said:
      mddev->persistent != !info->not_persistent ||
      Signed-off-by: default avatarFiro Yang <firogm@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      528feaea
    • Omar Sandoval's avatar
      Btrfs: don't invalidate root dentry when subvolume deletion fails · 54b1fb57
      Omar Sandoval authored
      commit 64ad6c48 upstream.
      
      Since commit bafc9b75 ("vfs: More precise tests in d_invalidate"),
      mounted subvolumes can be deleted because d_invalidate() won't fail.
      However, we run into problems when we attempt to delete the default
      subvolume while it is mounted as the root filesystem:
      
      	# btrfs subvol list /
      	ID 257 gen 306 top level 5 path rootvol
      	ID 267 gen 334 top level 5 path snap1
      	# btrfs subvol get-default /
      	ID 267 gen 334 top level 5 path snap1
      	# btrfs inspect-internal rootid /
      	267
      	# mount -o subvol=/ /dev/vda1 /mnt
      	# btrfs subvol del /mnt/snap1
      	Delete subvolume (no-commit): '/mnt/snap1'
      	ERROR: cannot delete '/mnt/snap1' - Operation not permitted
      	# findmnt /
      	findmnt: can't read /proc/mounts: No such file or directory
      	# ls /proc
      	#
      
      Markus reported that this same scenario simply led to a kernel oops.
      
      This happens because in btrfs_ioctl_snap_destroy(), we call
      d_invalidate() before we check may_destroy_subvol(), which means that we
      detach the submounts and drop the dentry before erroring out. Instead,
      we should only invalidate the dentry once the deletion has succeeded.
      Additionally, the shrink_dcache_sb() isn't necessary; d_invalidate()
      will prune the dcache for the deleted subvolume.
      
      Fixes: bafc9b75 ("vfs: More precise tests in d_invalidate")
      Reported-by: default avatarMarkus Schauler <mschauler@gmail.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54b1fb57
    • Stefan Wahren's avatar
      ARM: dts: mx23: fix iio-hwmon support · 83719f40
      Stefan Wahren authored
      commit e8e94ed6 upstream.
      
      In order to get iio-hwmon support, the lradc must be declared as an
      iio provider. So fix this issue by adding the #io-channel-cells property.
      Signed-off-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Fixes: bd798f9c ("ARM: dts: mxs: Add iio-hwmon to mx23 soc")
      Reviewed-by: default avatarMarek Vasut <marex@denx.de>
      Reviewed-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83719f40
    • Constantine Shulyupin's avatar
      hwmon: (nct7802) fix visibility of temp3 · 2618fae8
      Constantine Shulyupin authored
      commit 56172d81 upstream.
      
      Excerpt from datasheet:
      7.2.32 Mode Selection Register
      RTD3_MD : 00=Closed , 01=Reserved , 10=Thermistor mode , 11=Voltage sense
      
      Show temp3 only in Thermistor mode
      Signed-off-by: default avatarConstantine Shulyupin <const@MakeLinux.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2618fae8
    • Stevens, Nick's avatar
      hwmon: (mcp3021) Fix broken output scaling · f9440235
      Stevens, Nick authored
      commit 347d7e45 upstream.
      
      The mcp3021 scaling code is dividing the VDD (full-scale) value in
      millivolts by the A2D resolution to obtain the scaling factor. When VDD
      is 3300mV (the standard value) and the resolution is 12-bit (4096
      divisions), the result is a scale factor of 3300/4096, which is always
      one.  Effectively, the raw A2D reading is always being returned because
      no scaling is applied.
      
      This patch fixes the issue and simplifies the register-to-volts
      calculation, removing the unneeded "output_scale" struct member.
      Signed-off-by: default avatarNick Stevens <Nick.Stevens@digi.com>
      [Guenter Roeck: Dropped unnecessary value check]
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9440235
    • Goldwyn Rodrigues's avatar
      md: Skip cluster setup for dm-raid · 7640ca52
      Goldwyn Rodrigues authored
      commit d3b178ad upstream.
      
      There is a bug that the bitmap superblock isn't initialised properly for
      dm-raid, so a new field can have garbage in new fields.
      (dm-raid does initialisation in the kernel - md initialised the
       superblock in mdadm).
      
      This means that for dm-raid we cannot currently trust the new ->nodes
      field. So:
       - use __GFP_ZERO to initialise the superblock properly for all new
          arrays
       - initialise all fields in bitmap_info in bitmap_new_disk_sb
       - ignore ->nodes for dm arrays (yes, this is a hack)
      
      This bug exposes dm-raid to bug in the (still experimental) md-cluster
      code, so it is suitable for -stable.  It does cause crashes.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=100491Signed-off-By: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7640ca52
    • NeilBrown's avatar
      md: unlock mddev_lock on an error path. · 0f9457af
      NeilBrown authored
      commit 9a8c0fa8 upstream.
      
      This error path retuns while still holding the lock - bad.
      
      Fixes: 6791875e ("md: make reconfig_mutex optional for writes to md sysfs files.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f9457af
    • NeilBrown's avatar
      md: clear mddev->private when it has been freed. · adeb846a
      NeilBrown authored
      commit bd691922 upstream.
      
      If ->private is set when ->run is called, it is assumed to be
      a 'config'  prepared as part of 'reshape'.
      
      So it is important when we free that config, that we also clear ->private.
      This is not often a problem as the mddev will normally be discarded
      shortly after the config us freed.
      However if an 'assemble' races with a final close, the assemble can use
      the old mddev which has a stale ->private.  This leads to any of
      various sorts of crashes.
      
      So clear ->private after calling ->free().
      Reported-by: default avatarNate Clark <nate@neworld.us>
      Fixes: afa0f557 ("md: rename ->stop to ->free")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adeb846a
    • Lior Amsalem's avatar
      dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup · 499b1532
      Lior Amsalem authored
      commit 9136291f upstream.
      
      This patch fixes a bug in the XOR driver where the cleanup function can be
      called and free descriptors that never been processed by the engine (which
      result in data errors).
      
      The cleanup function will free descriptors based on the ownership bit in
      the descriptors.
      
      Fixes: ff7b0479 ("dmaengine: DMA engine driver for Marvell XOR engine")
      Signed-off-by: default avatarLior Amsalem <alior@marvell.com>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Reviewed-by: default avatarOfer Heifetz <oferh@marvell.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      499b1532
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix sample output of dynamic arrays · 63544f7d
      Steven Rostedt (Red Hat) authored
      commit d6726c81 upstream.
      
      He Kuang noticed that the trace event samples for arrays was broken:
      
      "The output result of trace_foo_bar event in traceevent samples is
       wrong. This problem can be reproduced as following:
      
        (Build kernel with SAMPLE_TRACE_EVENTS=m)
      
        $ insmod trace-events-sample.ko
      
        $ echo 1 > /sys/kernel/debug/tracing/events/sample-trace/foo_bar/enable
      
        $ cat /sys/kernel/debug/tracing/trace
      
        event-sample-980 [000] ....  43.649559: foo_bar: foo hello 21 0x15
        BIT1|BIT3|0x10 {0x1,0x6f6f6e53,0xff007970,0xffffffff} Snoopy
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       The array length is not right, should be {0x1}.
        (ffffffff,ffffffff)
      
        event-sample-980 [000] ....  44.653827: foo_bar: foo hello 22 0x16
        BIT2|BIT3|0x10
        {0x1,0x2,0x646e6147,0x666c61,0xffffffff,0xffffffff,0x750aeffe,0x7}
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       The array length is not right, should be {0x1,0x2}.
        Gandalf (ffffffff,ffffffff)"
      
      This was caused by an update to have __print_array()'s second parameter
      be the count of items in the array and not the size of the array.
      
      As there is already users of __print_array(), it can not change. But
      the sample code can and we can also improve on the documentation about
      __print_array() and __get_dynamic_array_len().
      
      Link: http://lkml.kernel.org/r/1436839171-31527-2-git-send-email-hekuang@huawei.com
      
      Fixes: ac01ce14 ("tracing: Make ftrace_print_array_seq compute buf_len")
      Reported-by: default avatarHe Kuang <hekuang@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63544f7d
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Have branch tracer use recursive field of task struct · 624dda42
      Steven Rostedt (Red Hat) authored
      commit 6224beb1 upstream.
      
      Fengguang Wu's tests triggered a bug in the branch tracer's start up
      test when CONFIG_DEBUG_PREEMPT set. This was because that config
      adds some debug logic in the per cpu field, which calls back into
      the branch tracer.
      
      The branch tracer has its own recursive checks, but uses a per cpu
      variable to implement it. If retrieving the per cpu variable calls
      back into the branch tracer, you can see how things will break.
      
      Instead of using a per cpu variable, use the trace_recursion field
      of the current task struct. Simply set a bit when entering the
      branch tracing and clear it when leaving. If the bit is set on
      entry, just don't do the tracing.
      
      There's also the case with lockdep, as the local_irq_save() called
      before the recursion can also trigger code that can call back into
      the function. Changing that to a raw_local_irq_save() will protect
      that as well.
      
      This prevents the recursion and the inevitable crash that follows.
      
      Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.comReported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      624dda42
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix typo from "static inlin" to "static inline" · 2161c867
      Steven Rostedt (Red Hat) authored
      commit cc9e4bde upstream.
      
      The trace.h header when called without CONFIG_EVENT_TRACING enabled
      (seldom done), will not compile because of a typo in the protocol
      of trace_event_enum_update().
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2161c867
    • Steven Rostedt (Red Hat)'s avatar
      tracing/filter: Do not allow infix to exceed end of string · a27274be
      Steven Rostedt (Red Hat) authored
      commit 6b88f44e upstream.
      
      While debugging a WARN_ON() for filtering, I found that it is possible
      for the filter string to be referenced after its end. With the filter:
      
       # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
      
      The filter_parse() function can call infix_get_op() which calls
      infix_advance() that updates the infix filter pointers for the cnt
      and tail without checking if the filter is already at the end, which
      will put the cnt to zero and the tail beyond the end. The loop then calls
      infix_next() that has
      
      	ps->infix.cnt--;
      	return ps->infix.string[ps->infix.tail++];
      
      The cnt will now be below zero, and the tail that is returned is
      already passed the end of the filter string. So far the allocation
      of the filter string usually has some buffer that is zeroed out, but
      if the filter string is of the exact size of the allocated buffer
      there's no guarantee that the charater after the nul terminating
      character will be zero.
      
      Luckily, only root can write to the filter.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a27274be
    • Steven Rostedt (Red Hat)'s avatar
      tracing/filter: Do not WARN on operand count going below zero · baa7b462
      Steven Rostedt (Red Hat) authored
      commit b4875bbe upstream.
      
      When testing the fix for the trace filter, I could not come up with
      a scenario where the operand count goes below zero, so I added a
      WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
      that it can happen (although the filter would be wrong).
      
       # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
      
      That is, a single operation without any operands will hit the path
      where the WARN_ON_ONCE() can trigger. Although this is harmless,
      and the filter is reported as a error. But instead of spitting out
      a warning to the kernel dmesg, just fail nicely and report it via
      the proper channels.
      
      Link: http://lkml.kernel.org/r/558C6082.90608@oracle.comReported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baa7b462
    • Mimi Zohar's avatar
      ima: update builtin policies · 66963999
      Mimi Zohar authored
      commit 24fd03c8 upstream.
      
      This patch defines a builtin measurement policy "tcb", similar to the
      existing "ima_tcb", but with additional rules to also measure files
      based on the effective uid and to measure files opened with the "read"
      mode bit set (eg. read, read-write).
      
      Changing the builtin "ima_tcb" policy could potentially break existing
      users.  Instead of defining a new separate boot command line option each
      time the builtin measurement policy is modified, this patch defines a
      single generic boot command line option "ima_policy=" to specify the
      builtin policy and deprecates the use of the builtin ima_tcb policy.
      
      [The "ima_policy=" boot command line option is based on Roberto Sassu's
      "ima: added new policy type exec" patch.]
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarDr. Greg Wettstein <gw@idfusion.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66963999
    • Mimi Zohar's avatar
      ima: extend "mask" policy matching support · bf609547
      Mimi Zohar authored
      commit 4351c294 upstream.
      
      The current "mask" policy option matches files opened as MAY_READ,
      MAY_WRITE, MAY_APPEND or MAY_EXEC.  This patch extends the "mask"
      option to match files opened containing one of these modes.  For
      example, "mask=^MAY_READ" would match files opened read-write.
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarDr. Greg Wettstein <gw@idfusion.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf609547
    • Mimi Zohar's avatar
      ima: add support for new "euid" policy condition · 9428e8a3
      Mimi Zohar authored
      commit 139069ef upstream.
      
      The new "euid" policy condition measures files with the specified
      effective uid (euid).  In addition, for CAP_SETUID files it measures
      files with the specified uid or suid.
      
      Changelog:
      - fixed checkpatch.pl warnings
      - fixed avc denied {setuid} messages - based on Roberto's feedback
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarDr. Greg Wettstein <gw@idfusion.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9428e8a3
    • Mimi Zohar's avatar
      ima: fix ima_show_template_data_ascii() · 2b92ad96
      Mimi Zohar authored
      commit 45b26133 upstream.
      
      This patch fixes a bug introduced in "4d7aeee ima: define new template
      ima-ng and template fields d-ng and n-ng".
      
      Changelog:
      - change int to uint32 (Roberto Sassu's suggestion)
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarRoberto Sassu <rsassu@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b92ad96
    • Mimi Zohar's avatar
      evm: labeling pseudo filesystems exception · 60e2874c
      Mimi Zohar authored
      commit 5101a185 upstream.
      
      To prevent offline stripping of existing file xattrs and relabeling of
      them at runtime, EVM allows only newly created files to be labeled.  As
      pseudo filesystems are not persistent, stripping of xattrs is not a
      concern.
      
      Some LSMs defer file labeling on pseudo filesystems.  This patch
      permits the labeling of existing files on pseudo files systems.
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60e2874c
    • Mimi Zohar's avatar
      ima: do not measure or appraise the NSFS filesystem · 0ecc8ea6
      Mimi Zohar authored
      commit cd025f7f upstream.
      
      Include don't appraise or measure rules for the NSFS filesystem
      in the builtin ima_tcb and ima_appraise_tcb policies.
      
      Changelog:
      - Update documentation
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ecc8ea6
    • Dan Carpenter's avatar
      ima: cleanup ima_init_policy() a little · 7869fa60
      Dan Carpenter authored
      commit 5577857f upstream.
      
      It's a bit easier to read this if we split it up into two for loops.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7869fa60
    • Roberto Sassu's avatar
      ima: skip measurement of cgroupfs files and update documentation · 73cc530a
      Roberto Sassu authored
      commit 6438de9f upstream.
      
      This patch adds a rule in the default measurement policy to skip inodes
      in the cgroupfs filesystem. Measurements for this filesystem can be
      avoided, as all the digests collected have the same value of the digest of
      an empty file.
      
      Furthermore, this patch updates the documentation of IMA policies in
      Documentation/ABI/testing/ima_policy to make it consistent with
      the policies set in security/integrity/ima/ima_policy.c.
      Signed-off-by: default avatarRoberto Sassu <rsassu@suse.de>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73cc530a
    • Colin Ian King's avatar
      KEYS: ensure we free the assoc array edit if edit is valid · 4fd5dc9e
      Colin Ian King authored
      commit ca4da5dd upstream.
      
      __key_link_end is not freeing the associated array edit structure
      and this leads to a 512 byte memory leak each time an identical
      existing key is added with add_key().
      
      The reason the add_key() system call returns okay is that
      key_create_or_update() calls __key_link_begin() before checking to see
      whether it can update a key directly rather than adding/replacing - which
      it turns out it can.  Thus __key_link() is not called through
      __key_instantiate_and_link() and __key_link_end() must cancel the edit.
      
      CVE-2015-1333
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fd5dc9e