1. 09 Oct, 2015 6 commits
    • Linus Torvalds's avatar
      Merge tag 'dm-4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 04445556
      Linus Torvalds authored
      Pull dm fixes from Mike Snitzer:
       "Three stable fixes:
      
         - DM core AB-BA deadlock fix in the device destruction path (vs
           device creation's DM table swap).
      
         - DM raid fix to properly round up the region_size to the next
           power-of-2.
      
         - DM cache fix for a NULL pointer seen while switching from the
           "cleaner" cache policy.
      
        Two fixes for regressions introduced during the 4.3 merge:
      
         - request-based DM error propagation regressed due to incorrect
           changes introduced when adding the bi_error field to bio.
      
         - DM snapshot fix to only support snapshots that overflow if the
           client (e.g. lvm2) is prepared to deal with the associated
           snapshot status interface change"
      
      * tag 'dm-4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm snapshot: add new persistent store option to support overflow
        dm cache: fix NULL pointer when switching from cleaner policy
        dm: fix request-based dm error reporting
        dm raid: fix round up of default region size
        dm: fix AB-BA deadlock in __dm_destroy()
      04445556
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 175d58cf
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "These are small and assorted.  Neil's is the oldest, I dropped the
        ball thinking he was going to send it in"
      
      * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: support NFSv2 export
        Btrfs: open_ctree: Fix possible memory leak
        Btrfs: fix deadlock when finalizing block group creation
        Btrfs: update fix for read corruption of compressed and shared extents
        Btrfs: send, fix corner case for reference overwrite detection
      175d58cf
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.3-1' of git://linux-nfs.org/~bfields/linux · 38aa0a59
      Linus Torvalds authored
      Pull nfsd bugfix from Bruce Fields:
       "Just one RDMA bugfix"
      
      * tag 'nfsd-4.3-1' of git://linux-nfs.org/~bfields/linux:
        svcrdma: handle rdma read with a non-zero initial page offset
      38aa0a59
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 5163ac76
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "The fixes for this week include one small patch that was years in the
        making and that finally fixes using all eight CPUs on exynos542x.
      
        The rest are lots of minor changes for sunxi, imx, exynos and shmobile
      
         - fixing the minimum voltage for Allwinner A20
         - thermal boot issue on SMDK5250.
         - invalid clock used for FIMD IOMMU.
         - audio on Renesas r8a7790/r8a7791
         - invalid clock used for FIMD IOMMU
         - LEDs on exynos5422-odroidxu3-common
         - usb pin control for imx-rex
         - imx53: fix PMIC interrupt level
         - a Makefile typo"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: dts: Fix wrong clock binding for sysmmu_fimd1_1 on exynos5420
        ARM: dts: Fix bootup thermal issue on smdk5250
        ARM: shmobile: r8a7791 dtsi: Add CPG/MSTP Clock Domain for sound
        ARM: shmobile: r8a7790 dtsi: Add CPG/MSTP Clock Domain for sound
        arm-cci500: Don't enable PMU driver by default
        ARM: dts: fix usb pin control for imx-rex dts
        ARM: imx53: qsrb: fix PMIC interrupt level
        ARM: imx53: include IRQ dt-bindings header
        ARM: dts: add suspend opp to exynos4412
        ARM: dts: Fix LEDs on exynos5422-odroidxu3
        ARM: EXYNOS: reset Little cores when cpu is up
        ARM: dts: Fix Makefile target for sun4i-a10-itead-iteaduino-plus
        ARM: dts: sunxi: Raise minimum CPU voltage for sun7i-a20 to meet SoC specifications
      5163ac76
    • Mike Snitzer's avatar
      dm snapshot: add new persistent store option to support overflow · b0d3cc01
      Mike Snitzer authored
      Commit 76c44f6d introduced the possibly for "Overflow" to be reported
      by the snapshot device's status.  Older userspace (e.g. lvm2) does not
      handle the "Overflow" status response.
      
      Fix this incompatibility by requiring newer userspace code, that can
      cope with "Overflow", request the persistent store with overflow support
      by using "PO" (Persistent with Overflow) for the snapshot store type.
      Reported-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Fixes: 76c44f6d ("dm snapshot: don't invalidate on-disk image on snapshot write overflow")
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      b0d3cc01
    • Joe Thornber's avatar
      dm cache: fix NULL pointer when switching from cleaner policy · 2bffa150
      Joe Thornber authored
      The cleaner policy doesn't make use of the per cache block hint space in
      the metadata (unlike the other policies).  When switching from the
      cleaner policy to mq or smq a NULL pointer crash (in dm_tm_new_block)
      was observed.  The crash was caused by bugs in dm-cache-metadata.c
      when trying to skip creation of the hint btree.
      
      The minimal fix is to change hint size for the cleaner policy to 4 bytes
      (only hint size supported).
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2bffa150
  2. 07 Oct, 2015 11 commits
  3. 06 Oct, 2015 18 commits
  4. 05 Oct, 2015 5 commits
    • Filipe Manana's avatar
      Btrfs: fix deadlock when finalizing block group creation · d9a0540a
      Filipe Manana authored
      Josef ran into a deadlock while a transaction handle was finalizing the
      creation of its block groups, which produced the following trace:
      
        [260445.593112] fio             D ffff88022a9df468     0  8924   4518 0x00000084
        [260445.593119]  ffff88022a9df468 ffffffff81c134c0 ffff880429693c00 ffff88022a9df488
        [260445.593126]  ffff88022a9e0000 ffff8803490d7b00 ffff8803490d7b18 ffff88022a9df4b0
        [260445.593132]  ffff8803490d7af8 ffff88022a9df488 ffffffff8175a437 ffff8803490d7b00
        [260445.593137] Call Trace:
        [260445.593145]  [<ffffffff8175a437>] schedule+0x37/0x80
        [260445.593189]  [<ffffffffa0850f37>] btrfs_tree_lock+0xa7/0x1f0 [btrfs]
        [260445.593197]  [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0
        [260445.593225]  [<ffffffffa07eac44>] btrfs_lock_root_node+0x34/0x50 [btrfs]
        [260445.593253]  [<ffffffffa07eff6b>] btrfs_search_slot+0x88b/0xa00 [btrfs]
        [260445.593295]  [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs]
        [260445.593324]  [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [260445.593351]  [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
        [260445.593394]  [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs]
        [260445.593427]  [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs]
        [260445.593459]  [<ffffffffa0800964>] do_chunk_alloc+0x2a4/0x2e0 [btrfs]
        [260445.593491]  [<ffffffffa0803815>] find_free_extent+0xa55/0xd90 [btrfs]
        [260445.593524]  [<ffffffffa0803c22>] btrfs_reserve_extent+0xd2/0x220 [btrfs]
        [260445.593532]  [<ffffffff8119fe5d>] ? account_page_dirtied+0xdd/0x170
        [260445.593564]  [<ffffffffa0803e78>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs]
        [260445.593597]  [<ffffffffa080c9de>] ? btree_set_page_dirty+0xe/0x10 [btrfs]
        [260445.593626]  [<ffffffffa07eb5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs]
        [260445.593654]  [<ffffffffa07ebbff>] btrfs_cow_block+0x11f/0x1c0 [btrfs]
        [260445.593682]  [<ffffffffa07ef8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs]
        [260445.593724]  [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs]
        [260445.593752]  [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [260445.593830]  [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
        [260445.593905]  [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs]
        [260445.593946]  [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs]
        [260445.593990]  [<ffffffffa0815798>] btrfs_commit_transaction+0xa8/0xb40 [btrfs]
        [260445.594042]  [<ffffffffa085abcd>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs]
        [260445.594089]  [<ffffffffa082bc84>] btrfs_sync_file+0x294/0x350 [btrfs]
        [260445.594115]  [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0
        [260445.594133]  [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180
        [260445.594149]  [<ffffffff8123e35d>] do_fsync+0x3d/0x70
        [260445.594169]  [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110
        [260445.594187]  [<ffffffff8123e600>] SyS_fsync+0x10/0x20
        [260445.594204]  [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      This happened because the same transaction handle created a large number
      of block groups and while finalizing their creation (inserting new items
      and updating existing items in the chunk and device trees) a new metadata
      extent had to be allocated and no free space was found in the current
      metadata block groups, which made find_free_extent() attempt to allocate
      a new block group via do_chunk_alloc(). However at do_chunk_alloc() we
      ended up allocating a new system chunk too and exceeded the threshold
      of 2Mb of reserved chunk bytes, which makes do_chunk_alloc() enter the
      final part of block group creation again (at
      btrfs_create_pending_block_groups()) and attempt to lock again the root
      of the chunk tree when it's already write locked by the same task.
      
      Similarly we can deadlock on extent tree nodes/leafs if while we are
      running delayed references we end up creating a new metadata block group
      in order to allocate a new node/leaf for the extent tree (as part of
      a CoW operation or growing the tree), as btrfs_create_pending_block_groups
      inserts items into the extent tree as well. In this case we get the
      following trace:
      
        [14242.773581] fio             D ffff880428ca3418     0  3615   3100 0x00000084
        [14242.773588]  ffff880428ca3418 ffff88042d66b000 ffff88042a03c800 ffff880428ca3438
        [14242.773594]  ffff880428ca4000 ffff8803e4b20190 ffff8803e4b201a8 ffff880428ca3460
        [14242.773600]  ffff8803e4b20188 ffff880428ca3438 ffffffff8175a437 ffff8803e4b20190
        [14242.773606] Call Trace:
        [14242.773613]  [<ffffffff8175a437>] schedule+0x37/0x80
        [14242.773656]  [<ffffffffa057ff07>] btrfs_tree_lock+0xa7/0x1f0 [btrfs]
        [14242.773664]  [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0
        [14242.773692]  [<ffffffffa0519c44>] btrfs_lock_root_node+0x34/0x50 [btrfs]
        [14242.773720]  [<ffffffffa051ef6b>] btrfs_search_slot+0x88b/0xa00 [btrfs]
        [14242.773750]  [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [14242.773758]  [<ffffffff811ef4a2>] ? kmem_cache_alloc+0x1d2/0x200
        [14242.773786]  [<ffffffffa0520ad1>] btrfs_insert_item+0x71/0xf0 [btrfs]
        [14242.773818]  [<ffffffffa052f292>] btrfs_create_pending_block_groups+0x102/0x200 [btrfs]
        [14242.773850]  [<ffffffffa052f96e>] do_chunk_alloc+0x2ae/0x2f0 [btrfs]
        [14242.773934]  [<ffffffffa0532825>] find_free_extent+0xa55/0xd90 [btrfs]
        [14242.773998]  [<ffffffffa0532c22>] btrfs_reserve_extent+0xc2/0x1d0 [btrfs]
        [14242.774041]  [<ffffffffa0532e38>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs]
        [14242.774078]  [<ffffffffa051a5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs]
        [14242.774118]  [<ffffffffa051abff>] btrfs_cow_block+0x11f/0x1c0 [btrfs]
        [14242.774155]  [<ffffffffa051e8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs]
        [14242.774194]  [<ffffffffa0528021>] ? __btrfs_free_extent.isra.70+0x2e1/0xcb0 [btrfs]
        [14242.774235]  [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs]
        [14242.774274]  [<ffffffffa051994a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
        [14242.774318]  [<ffffffffa052c433>] __btrfs_run_delayed_refs+0xbb3/0x1020 [btrfs]
        [14242.774358]  [<ffffffffa052f404>] btrfs_run_delayed_refs.part.78+0x74/0x280 [btrfs]
        [14242.774391]  [<ffffffffa052f627>] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
        [14242.774432]  [<ffffffffa05be236>] commit_cowonly_roots+0x8d/0x2bd [btrfs]
        [14242.774474]  [<ffffffffa059d07f>] ? __btrfs_run_delayed_items+0x1cf/0x210 [btrfs]
        [14242.774516]  [<ffffffffa05adac3>] ? btrfs_qgroup_account_extents+0x83/0x130 [btrfs]
        [14242.774558]  [<ffffffffa0544c40>] btrfs_commit_transaction+0x590/0xb40 [btrfs]
        [14242.774599]  [<ffffffffa0589b9d>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs]
        [14242.774642]  [<ffffffffa055ac54>] btrfs_sync_file+0x294/0x350 [btrfs]
        [14242.774650]  [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0
        [14242.774657]  [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180
        [14242.774663]  [<ffffffff8123e35d>] do_fsync+0x3d/0x70
        [14242.774669]  [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110
        [14242.774675]  [<ffffffff8123e600>] SyS_fsync+0x10/0x20
        [14242.774681]  [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      Fix this by never recursing into the finalization phase of block group
      creation and making sure we never trigger the finalization of block group
      creation while running delayed references.
      Reported-by: default avatarJosef Bacik <jbacik@fb.com>
      Fixes: 00d80e34 ("Btrfs: fix quick exhaustion of the system array in the superblock")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      d9a0540a
    • Filipe Manana's avatar
      Btrfs: update fix for read corruption of compressed and shared extents · 808f80b4
      Filipe Manana authored
      My previous fix in commit 005efedf ("Btrfs: fix read corruption of
      compressed and shared extents") was effective only if the compressed
      extents cover a file range with a length that is not a multiple of 16
      pages. That's because the detection of when we reached a different range
      of the file that shares the same compressed extent as the previously
      processed range was done at extent_io.c:__do_contiguous_readpages(),
      which covers subranges with a length up to 16 pages, because
      extent_readpages() groups the pages in clusters no larger than 16 pages.
      So fix this by tracking the start of the previously processed file
      range's extent map at extent_readpages().
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create our test file with a single extent of 64Kb that is going to
            # be compressed no matter which compression algo is used (zlib/lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
                $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone the compressed extent into an adjacent file offset.
            $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            echo "File digest before unmount:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
      
            # Remount the fs or clear the page cache to trigger the bug in
            # btrfs. Because the extent has an uncompressed length that is a
            # multiple of 16 pages, all the pages belonging to the second range
            # of the file (64K to 128K), which points to the same extent as the
            # first range (0K to 64K), had their contents full of zeroes instead
            # of the byte 0xaa. This was a bug exclusively in the read path of
            # compressed extents, the correct data was stored on disk, btrfs
            # just failed to fill in the pages correctly.
            _scratch_remount
      
            echo "File digest after remount:"
            # Must match the digest we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Tested-by: default avatarTimofey Titovets <nefelim4ag@gmail.com>
      808f80b4
    • Filipe Manana's avatar
      Btrfs: send, fix corner case for reference overwrite detection · b786f16a
      Filipe Manana authored
      When the inode given to did_overwrite_ref() matches the current progress
      and has a reference that collides with the reference of other inode that
      has the same number as the current progress, we were always telling our
      caller that the inode's reference was overwritten, which is incorrect
      because the other inode might be a new inode (different generation number)
      in which case we must return false from did_overwrite_ref() so that its
      callers don't use an orphanized path for the inode (as it will never be
      orphanized, instead it will be unlinked and the new inode created later).
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -fr $send_files_dir
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _need_to_be_root
      
        send_files_dir=$TEST_DIR/btrfs-test-$seq
      
        rm -f $seqres.full
        rm -fr $send_files_dir
        mkdir $send_files_dir
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test file with a single extent of 64K.
        mkdir -p $SCRATCH_MNT/foo
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" $SCRATCH_MNT/foo/bar \
            | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap1
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT \
            $SCRATCH_MNT/mysnap2
      
        echo "File digest before being replaced:"
        md5sum $SCRATCH_MNT/mysnap1/foo/bar | _filter_scratch
      
        # Remove the file and then create a new one in the same location with
        # the same name but with different content. This new file ends up
        # getting the same inode number as the previous one, because that inode
        # number was the highest inode number used by the snapshot's root and
        # therefore when attempting to find the a new inode number for the new
        # file, we end up reusing the same inode number. This happens because
        # currently btrfs uses the highest inode number summed by 1 for the
        # first inode created once a snapshot's root is loaded (done at
        # fs/btrfs/inode-map.c:btrfs_find_free_objectid in the linux kernel
        # tree).
        # Having these two different files in the snapshots with the same inode
        # number (but different generation numbers) caused the btrfs send code
        # to emit an incorrect path for the file when issuing an unlink
        # operation because it failed to realize they were different files.
        rm -f $SCRATCH_MNT/mysnap2/foo/bar
        $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 96K" \
            $SCRATCH_MNT/mysnap2/foo/bar | _filter_xfs_io
      
        _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT/mysnap2 \
            $SCRATCH_MNT/mysnap2_ro
      
        _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap
        _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 \
            $SCRATCH_MNT/mysnap2_ro -f $send_files_dir/2.snap
      
        echo "File digest in the original filesystem after being replaced:"
        md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch
      
        # Now recreate the filesystem by receiving both send streams and verify
        # we get the same file contents that the original filesystem had.
        _scratch_unmount
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/1.snap
        _run_btrfs_util_prog receive -vv $SCRATCH_MNT -f $send_files_dir/2.snap
      
        echo "File digest in the new filesystem:"
        # Must match the digest from the new file.
        md5sum $SCRATCH_MNT/mysnap2_ro/foo/bar | _filter_scratch
      
        status=0
        exit
      Reported-by: default avatarMartin Raiber <martin@urbackup.org>
      Fixes: 8b191a68 ("Btrfs: incremental send, check if orphanized dir inode needs delayed rename")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      b786f16a
    • Yang Shi's avatar
      arm64: convert patch_lock to raw lock · abffa6f3
      Yang Shi authored
      When running kprobe test on arm64 rt kernel, it reports the below warning:
      
      root@qemu7:~# modprobe kprobe_example
      BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
      in_atomic(): 0, irqs_disabled(): 128, pid: 484, name: modprobe
      CPU: 0 PID: 484 Comm: modprobe Not tainted 4.1.6-rt5 #2
      Hardware name: linux,dummy-virt (DT)
      Call trace:
      [<ffffffc0000891b8>] dump_backtrace+0x0/0x128
      [<ffffffc000089300>] show_stack+0x20/0x30
      [<ffffffc00061dae8>] dump_stack+0x1c/0x28
      [<ffffffc0000bbad0>] ___might_sleep+0x120/0x198
      [<ffffffc0006223e8>] rt_spin_lock+0x28/0x40
      [<ffffffc000622b30>] __aarch64_insn_write+0x28/0x78
      [<ffffffc000622e48>] aarch64_insn_patch_text_nosync+0x18/0x48
      [<ffffffc000622ee8>] aarch64_insn_patch_text_cb+0x70/0xa0
      [<ffffffc000622f40>] aarch64_insn_patch_text_sync+0x28/0x48
      [<ffffffc0006236e0>] arch_arm_kprobe+0x38/0x48
      [<ffffffc00010e6f4>] arm_kprobe+0x34/0x50
      [<ffffffc000110374>] register_kprobe+0x4cc/0x5b8
      [<ffffffbffc002038>] kprobe_init+0x38/0x7c [kprobe_example]
      [<ffffffc000084240>] do_one_initcall+0x90/0x1b0
      [<ffffffc00061c498>] do_init_module+0x6c/0x1cc
      [<ffffffc0000fd0c0>] load_module+0x17f8/0x1db0
      [<ffffffc0000fd8cc>] SyS_finit_module+0xb4/0xc8
      
      Convert patch_lock to raw loc kto avoid this issue.
      
      Although the problem is found on rt kernel, the fix should be applicable to
      mainline kernel too.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      abffa6f3
    • Mark Salyzyn's avatar
      arm64: readahead: fault retry breaks mmap file read random detection · 569ba74a
      Mark Salyzyn authored
      This is the arm64 portion of commit 45cac65b ("readahead: fault
      retry breaks mmap file read random detection"), which was absent from
      the initial port and has since gone unnoticed. The original commit says:
      
      > .fault now can retry.  The retry can break state machine of .fault.  In
      > filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
      > try, since the page is in page cache now, ra->mmap_miss is decreased.  And
      > these are done in one fault, so we can't detect random mmap file access.
      >
      > Add a new flag to indicate .fault is tried once.  In the second try, skip
      > ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.
      
      With this change, Mark reports that:
      
      > Random read improves by 250%, sequential read improves by 40%, and
      > random write by 400% to an eMMC device with dm crypto wrapped around it.
      
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMark Salyzyn <salyzyn@android.com>
      Signed-off-by: default avatarRiley Andrews <riandrews@android.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      569ba74a