1. 13 Mar, 2019 21 commits
    • Dennis Zhou's avatar
      percpu: use chunk scan_hint to skip some scanning · d33d9f3d
      Dennis Zhou authored
      Just like blocks, chunks now maintain a scan_hint. This can be used to
      skip some scanning by promoting the scan_hint to be the contig_hint.
      The chunk's scan_hint is primarily updated on the backside and relies on
      full scanning when a block becomes free or the free region spans across
      blocks.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      d33d9f3d
    • Dennis Zhou's avatar
      percpu: convert chunk hints to be based on pcpu_block_md · 92c14cab
      Dennis Zhou authored
      As mentioned in the last patch, a chunk's hints are no different than a
      block just responsible for more bits. This converts chunk level hints to
      use a pcpu_block_md to maintain them. This lets us reuse the same hint
      helper functions as a block. The left_free and right_free are unused by
      the chunk's pcpu_block_md.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      92c14cab
    • Dennis Zhou's avatar
      percpu: make pcpu_block_md generic · 047924c9
      Dennis Zhou authored
      In reality, a chunk is just a block covering a larger number of bits.
      The hints themselves are one in the same. Rather than maintaining the
      hints separately, first introduce nr_bits to genericize
      pcpu_block_update() to correctly maintain block->right_free. The next
      patch will convert chunk hints to be managed as a pcpu_block_md.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      047924c9
    • Dennis Zhou's avatar
      percpu: use block scan_hint to only scan forward · da3afdd5
      Dennis Zhou authored
      Blocks now remember the latest scan_hint. This can be used on the
      allocation path as when a contig_hint is broken, we can promote the
      scan_hint to the contig_hint and scan forward from there. This works
      because pcpu_block_refresh_hint() is only called on the allocation path
      while block free regions are updated manually in
      pcpu_block_update_hint_free().
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      da3afdd5
    • Dennis Zhou's avatar
      percpu: remember largest area skipped during allocation · b89462a9
      Dennis Zhou authored
      Percpu allocations attempt to do first fit by scanning forward from the
      first_free of a block. However, fragmentation from allocation requests
      can cause holes not seen by block hint update functions. To address
      this, create a local version of bitmap_find_next_zero_area_off() that
      remembers the largest area skipped over. The caveat is that it only sees
      regions skipped over due to not fitting, not regions skipped due to
      alignment.
      
      Prior to updating the scan_hint, a scan backwards is done to try and
      recover free bits skipped due to alignment. While this can cause
      scanning to miss earlier possible free areas, smaller allocations will
      eventually fill those holes due to first fit.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      b89462a9
    • Dennis Zhou's avatar
      percpu: add block level scan_hint · 382b88e9
      Dennis Zhou authored
      Fragmentation can cause both blocks and chunks to have an early
      first_firee bit available, but only able to satisfy allocations much
      later on. This patch introduces a scan_hint to help mitigate some
      unnecessary scanning.
      
      The scan_hint remembers the largest area prior to the contig_hint. If
      the contig_hint == scan_hint, then scan_hint_start > contig_hint_start.
      This is necessary for scan_hint discovery when refreshing a block.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      382b88e9
    • Dennis Zhou's avatar
      percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE · b239f7da
      Dennis Zhou authored
      Previously, block size was flexible based on the constraint that the
      GCD(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) > 1. However, this carried the
      overhead that keeping a floating number of populated free pages required
      scanning over the free regions of a chunk.
      
      Setting the block size to be fixed at PAGE_SIZE lets us know when an
      empty page becomes used as we will break a full contig_hint of a block.
      This means we no longer have to scan the whole chunk upon breaking a
      contig_hint which empty page management piggybacked off. A later patch
      takes advantage of this to optimize the allocation path by only scanning
      forward using the scan_hint introduced later too.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      b239f7da
    • Dennis Zhou's avatar
      percpu: relegate chunks unusable when failing small allocations · 8744d859
      Dennis Zhou authored
      In certain cases, requestors of percpu memory may want specific
      alignments. However, it is possible to end up in situations where the
      contig_hint matches, but the alignment does not. This causes excess
      scanning of chunks that will fail. To prevent this, if a small
      allocation fails (< 32B), the chunk is moved to the empty list. Once an
      allocation is freed from that chunk, it is placed back into rotation.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      8744d859
    • Dennis Zhou's avatar
      percpu: manage chunks based on contig_bits instead of free_bytes · 3e54097b
      Dennis Zhou authored
      When a chunk becomes fragmented, it can end up having a large number of
      small allocation areas free. The free_bytes sorting of chunks leads to
      unnecessary checking of chunks that cannot satisfy the allocation.
      Switch to contig_bits sorting to prevent scanning chunks that may not be
      able to service the allocation request.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      3e54097b
    • Dennis Zhou's avatar
      percpu: introduce helper to determine if two regions overlap · d9f3a01e
      Dennis Zhou authored
      While block hints were always accurate, it's possible when spanning
      across blocks that we miss updating the chunk's contig_hint. Rather than
      rely on correctness of the boundaries of hints, do a full overlap
      comparison.
      
      A future patch introduces the scan_hint which makes the contig_hint
      slightly fuzzy as they can at times be smaller than the actual hint.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      d9f3a01e
    • Dennis Zhou's avatar
      percpu: do not search past bitmap when allocating an area · 8c43004a
      Dennis Zhou authored
      pcpu_find_block_fit() guarantees that a fit is found within
      PCPU_BITMAP_BLOCK_BITS. Iteration is used to determine the first fit as
      it compares against the block's contig_hint. This can lead to
      incorrectly scanning past the end of the bitmap. The behavior was okay
      given the check after for bit_off >= end and the correctness of the
      hints from pcpu_find_block_fit().
      
      This patch fixes this by bounding the end offset by the number of bits
      in a chunk.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      8c43004a
    • Dennis Zhou's avatar
      percpu: update free path with correct new free region · 8e5a2b98
      Dennis Zhou authored
      When updating the chunk's contig_hint on the free path of a hint that
      does not touch the page boundaries, it was incorrectly using the
      starting offset of the free region and the block's contig_hint. This
      could lead to incorrect assumptions about fit given a size and better
      alignment of the start. Fix this by using (end - start) as this is only
      called when updating a hint within a block.
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarPeng Fan <peng.fan@nxp.com>
      8e5a2b98
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20190312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · fa3d493f
      Linus Torvalds authored
      Pull selinux fixes from Paul Moore:
       "Two small fixes for SELinux in v5.1: one adds a buffer length check to
        the SELinux SCTP code, the other ensures that the SELinux labeling for
        a NFS mount is not disabled if the filesystem is mounted twice"
      
      * tag 'selinux-pr-20190312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock
        selinux: add the missing walk_size + len check in selinux_sctp_bind_connect
      fa3d493f
    • Linus Torvalds's avatar
      Merge tag 'apparmor-pr-2019-03-12' of... · 8636b1db
      Linus Torvalds authored
      Merge tag 'apparmor-pr-2019-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor fixes from John Johansen:
      
       - fix double when failing to unpack secmark rules in policy
      
       - fix leak of dentry when profile is removed
      
      * tag 'apparmor-pr-2019-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: fix double free when unpack of secmark rules fails
        apparmor: delete the dentry in aafs_remove() to avoid a leak
        apparmor: Fix warning about unused function apparmor_ipv6_postroute
      8636b1db
    • Linus Torvalds's avatar
      Merge tag 'kconfig-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 5453a3df
      Linus Torvalds authored
      Pull Kconfig updates from Masahiro Yamada:
      
       - rename lexer and parse files
      
       - fix 'Save as' menu of xconfig
      
      * tag 'kconfig-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: fix 'Save As' menu of xconfig
        kconfig: rename zconf.y to parser.y
        kconfig: rename zconf.l to lexer.l
      5453a3df
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-5.1-rc1' of... · add8462a
      Linus Torvalds authored
      Merge tag 'pwm/for-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm updates from Thierry Reding:
       "The changes for this cycle are across the board.
      
        The bulk of it is cleanups, but there's also new device support in
        some drivers as well as more conversions to the atomic API"
      
      * tag 'pwm/for-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (24 commits)
        pwm: atmel: Remove useless symbolic definitions
        pwm: bcm-kona: Update macros to remove braces around numbers
        pwm: imx27: Only enable the clocks once in .get_state()
        pwm: rcar: Improve calculation of divider
        pwm: rcar: Remove legacy APIs
        pwm: rcar: Use "atomic" API on rcar_pwm_resume()
        pwm: rcar: Add support "atomic" API
        pwm: atmel: Add support for SAM9X60's PWM controller
        pwm: atmel: Add PWM binding for SAM9X60
        pwm: atmel: Rename objects of type atmel_pwm_data
        pwm: atmel: Add support for controllers with 32 bit counters
        pwm: atmel: Add struct atmel_pwm_data
        pwm: Add MediaTek MT8183 display PWM driver support
        pwm: hibvt: Add hi3559v100 support
        dt-bindings: pwm: hibvt: Add hi3559v100 support
        pwm: hibvt: Use individual struct per of-data
        pwm: imx: Signedness bug in imx_pwm_get_state()
        pwm: imx: Split into two drivers
        pwm: imx: Don't print an error on -EPROBE_DEFER
        pwm: imx: Set driver data earlier simplifying the end of ->probe()
        ...
      add8462a
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v5.1' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 3a186d38
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
      
       - mailbox-test: support multiple controller instances
      
       - misc cleanup: IMX, STM32 and Tegra
      
       - new driver: ZynqMP IPI
      
      * tag 'mailbox-v5.1' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: imx: keep MU irq working during suspend/resume
        dt-bindings: mailbox: Add Xilinx IPI Mailbox
        mailbox: ZynqMP IPI mailbox controller
        mailbox: stm32-ipcc: remove useless device_init_wakeup call
        mailbox: stm32-ipcc: do not enable wakeup source by default
        mailbox: mailbox-test: fix null pointer if no mmio
        mailbox: mailbox-test: fix debugfs in multi-instances
        mailbox: tegra-hsp: mark suspend function as __maybe_unused
      3a186d38
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · dac0bde4
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "This fixes a bug in the newly added Exynos5433 AES code as well as an
        old one in the caam driver"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: caam - add missing put_device() call
        crypto: s5p-sss - fix AES support for Exynos5433
      dac0bde4
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 5ea6718b
      Linus Torvalds authored
      Pull libnvdimm updates from Dan Williams:
       "The bulk of this has been in -next since before the merge window
        opened, with no known collisions / issues reported.
      
        The only detail worth noting, outside the summary below, is that the
        "libnvdimm-start-pad" topic has been truncated to just cleanups and
        small fixes. The full topic branch would have doubled down on hacks
        around the "section alignment" limitation of the core-mm, instead
        effort is now being spent to address that root issue in the memory
        hotplug implementation for v5.2.
      
         - Fix nfit-bus command submission regression
      
         - Support retrieval of short-ARS results if the ARS state is
           "requires continuation", and even if the "no_init_ars" module
           parameter is specified
      
         - Allow busy-polling of the kernel ARS state by allowing root to
           reset the exponential back-off timer
      
         - Filter potentially stale ARS results by tracking query-ARS relative
           to the previous start-ARS
      
         - Enhance dax_device alignment checks
      
         - Add support for the Hyper-V family of device-specific-methods
           (DSMs)
      
         - Add several fixes and workarounds for Hyper-V compatibility
      
         - Fix support to cache the dirty-shutdown-count at init"
      
      * tag 'libnvdimm-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (25 commits)
        libnvdimm/namespace: Clean up holder_class_store()
        libnvdimm/of_pmem: Fix platform_no_drv_owner.cocci warnings
        acpi/nfit: Update NFIT flags error message
        libnvdimm/btt: Fix LBA masking during 'free list' population
        libnvdimm/btt: Remove unnecessary code in btt_freelist_init
        libnvdimm/pfn: Remove dax_label_reserve
        dax: Check the end of the block-device capacity with dax_direct_access()
        nfit/ars: Avoid stale ARS results
        nfit/ars: Allow root to busy-poll the ARS state machine
        nfit/ars: Introduce scrub_flags
        nfit/ars: Remove ars_start_flags
        nfit/ars: Attempt short-ARS even in the no_init_ars case
        nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot
        acpi/nfit: Require opt-in for read-only label configurations
        libnvdimm/pmem: Honor force_raw for legacy pmem regions
        libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init()
        libnvdimm: Fix altmap reservation size calculation
        libnvdimm, pfn: Fix over-trim in trim_pfn_device()
        acpi/nfit: Fix bus command validation
        libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM family
        ...
      5ea6718b
    • Linus Torvalds's avatar
      Merge tag 'fsdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 3bb0f28d
      Linus Torvalds authored
      Pull filesystem-dax updates from Dan Williams:
      
       - Fix handling of PMD-sized entries in the Xarray that lead to a crash
         scenario
      
       - Miscellaneous cleanups and small fixes
      
      * tag 'fsdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        dax: Flush partial PMDs correctly
        fs/dax: NIT fix comment regarding start/end vs range
        fs/dax: Convert to use vmf_error()
      3bb0f28d
    • Linus Torvalds's avatar
      Merge tag 'upstream-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs · a840b56b
      Linus Torvalds authored
      Pull UBI and UBIFS updates from Richard Weinberger:
      
       - A new interface for UBI to deal better with read disturb
      
       - Reject unsupported ioctl flags in UBIFS (xfstests found it)
      
      * tag 'upstream-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
        ubi: wl: Silence uninitialized variable warning
        ubifs: Reject unsupported ioctl flags explicitly
        ubi: Expose the bitrot interface
        ubi: Introduce in_pq()
      a840b56b
  2. 12 Mar, 2019 19 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linux · ebc551f2
      Linus Torvalds authored
      Pull NFS server updates from Bruce Fields:
       "Miscellaneous NFS server fixes.
      
        Probably the most visible bug is one that could artificially limit
        NFSv4.1 performance by limiting the number of oustanding rpcs from a
        single client.
      
        Neil Brown also gets a special mention for fixing a 14.5-year-old
        memory-corruption bug in the encoding of NFSv3 readdir responses"
      
      * tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linux:
        nfsd: allow nfsv3 readdir request to be larger.
        nfsd: fix wrong check in write_v4_end_grace()
        nfsd: fix memory corruption caused by readdir
        nfsd: fix performance-limiting session calculation
        svcrpc: fix UDP on servers with lots of threads
        svcrdma: Remove syslog warnings in work completion handlers
        svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabled
        svcrdma: Use struct_size() in kmalloc()
        svcrpc: fix unlikely races preventing queueing of sockets
        svcrpc: svc_xprt_has_something_to_do seems a little long
        SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
        nfsd: fix an IS_ERR() vs NULL check
      ebc551f2
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · a5adcfca
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "A large number of bug fixes and cleanups.
      
        One new feature to allow users to more easily find the jbd2 journal
        thread for a particular ext4 file system"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
        jbd2: jbd2_get_transaction does not need to return a value
        jbd2: fix invalid descriptor block checksum
        ext4: fix bigalloc cluster freeing when hole punching under load
        ext4: add sysfs attr /sys/fs/ext4/<disk>/journal_task
        ext4: Change debugging support help prefix from EXT4 to Ext4
        ext4: fix compile error when using BUFFER_TRACE
        jbd2: fix compile warning when using JBUFFER_TRACE
        ext4: fix some error pointer dereferences
        ext4: annotate more implicit fall throughs
        ext4: annotate implicit fall throughs
        ext4: don't update s_rev_level if not required
        jbd2: fold jbd2_superblock_csum_{verify,set} into their callers
        jbd2: fix race when writing superblock
        ext4: fix crash during online resizing
        ext4: disallow files with EXT4_JOURNAL_DATA_FL from EXT4_IOC_SWAP_BOOT
        ext4: add mask of ext4 flags to swap
        ext4: update quota information while swapping boot loader inode
        ext4: cleanup pagecache before swap i_data
        ext4: fix check of inode in swap_inode_boot_loader
        ext4: unlock unused_pages timely when doing writeback
        ...
      a5adcfca
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-client · 2b0a80b0
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "The highlights are:
      
         - rbd will now ignore discards that aren't aligned and big enough to
           actually free up some space (myself). This is controlled by the new
           alloc_size map option and can be disabled if needed.
      
         - support for rbd deep-flatten feature (myself). Deep-flatten allows
           "rbd flatten" to fully disconnect the clone image and its snapshots
           from the parent and make the parent snapshot removable.
      
         - a new round of cap handling improvements (Zheng Yan). The kernel
           client should now be much more prompt about releasing its caps and
           it is possible to put a limit on the number of caps held.
      
         - support for getting ceph.dir.pin extended attribute (Zheng Yan)"
      
      * tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-client: (26 commits)
        Documentation: modern versions of ceph are not backed by btrfs
        rbd: advertise support for RBD_FEATURE_DEEP_FLATTEN
        rbd: whole-object write and zeroout should copyup when snapshots exist
        rbd: copyup with an empty snapshot context (aka deep-copyup)
        rbd: introduce rbd_obj_issue_copyup_ops()
        rbd: stop copying num_osd_ops in rbd_obj_issue_copyup()
        rbd: factor out __rbd_osd_req_create()
        rbd: clear ->xferred on error from rbd_obj_issue_copyup()
        rbd: remove experimental designation from kernel layering
        ceph: add mount option to limit caps count
        ceph: periodically trim stale dentries
        ceph: delete stale dentry when last reference is dropped
        ceph: remove dentry_lru file from debugfs
        ceph: touch existing cap when handling reply
        ceph: pass inclusive lend parameter to filemap_write_and_wait_range()
        rbd: round off and ignore discards that are too small
        rbd: handle DISCARD and WRITE_ZEROES separately
        rbd: get rid of obj_req->obj_request_count
        libceph: use struct_size() for kmalloc() in crush_decode()
        ceph: send cap releases more aggressively
        ...
      2b0a80b0
    • Linus Torvalds's avatar
      Merge tag 'for-5.1-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 92825b02
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Correctness and a deadlock fixes"
      
      * tag 'for-5.1-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: zstd: ensure reclaim timer is properly cleaned up
        btrfs: move ulist allocation out of transaction in quota enable
        btrfs: save drop_progress if we drop refs at all
        btrfs: check for refs on snapshot delete resume
        Btrfs: fix deadlock between clone/dedupe and rename
        Btrfs: fix corruption reading shared and compressed extents after hole punching
      92825b02
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 1fbf3e48
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Highlights include:
      
        Stable fixes:
         - Fixes for NFS I/O request leakages
         - Fix error handling paths in the NFS I/O recoalescing code
         - Reinitialise NFSv4.1 sequence results before retransmitting a
           request
         - Fix a soft lockup in the delegation recovery code
         - Bulk destroy of layouts needs to be safe w.r.t. umount
         - Prevent thundering herd issues when the SUNRPC socket is not
           connected
         - Respect RPC call timeouts when retrying transmission
      
        Features:
         - Convert rpc auth layer to use xdr_streams
         - Config option to disable insecure RPCSEC_GSS crypto types
         - Reduce size of RPC receive buffers
         - Readdirplus optimization by cache mechanism
         - Convert SUNRPC socket send code to use iov_iter()
         - SUNRPC micro-optimisations to avoid indirect calls
         - Add support for the pNFS LAYOUTERROR operation and use it with the
           pNFS/flexfiles driver
         - Add trace events to report non-zero NFS status codes
         - Various removals of unnecessary dprintks
      
        Bugfixes and cleanups:
         - Fix a number of sparse warnings and documentation format warnings
         - Fix nfs_parse_devname to not modify it's argument
         - Fix potential corruption of page being written through pNFS/blocks
         - fix xfstest generic/099 failures on nfsv3
         - Avoid NFSv4.1 "false retries" when RPC calls are interrupted
         - Abort I/O early if the pNFS/flexfiles layout segment was
           invalidated
         - Avoid unnecessary pNFS/flexfiles layout invalidations"
      
      * tag 'nfs-for-5.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
        SUNRPC: Take the transport send lock before binding+connecting
        SUNRPC: Micro-optimise when the task is known not to be sleeping
        SUNRPC: Check whether the task was transmitted before rebind/reconnect
        SUNRPC: Remove redundant calls to RPC_IS_QUEUED()
        SUNRPC: Clean up
        SUNRPC: Respect RPC call timeouts when retrying transmission
        SUNRPC: Fix up RPC back channel transmission
        SUNRPC: Prevent thundering herd when the socket is not connected
        SUNRPC: Allow dynamic allocation of back channel slots
        NFSv4.1: Bump the default callback session slot count to 16
        SUNRPC: Convert remaining GFP_NOIO, and GFP_NOWAIT sites in sunrpc
        NFS/flexfiles: Clean up mirror DS initialisation
        NFS/flexfiles: Remove dead code in ff_layout_mirror_valid()
        NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()
        NFS/flexfile: Simplify nfs4_ff_layout_ds_version()
        NFS/flexfiles: Simplify ff_layout_get_ds_cred()
        NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()
        NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()
        NFS/flexfiles: Speed up read failover when DSes are down
        NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive
        ...
      1fbf3e48
    • Linus Torvalds's avatar
      Merge tag 'ovl-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · f88c5942
      Linus Torvalds authored
      Pull overlayfs updates from Miklos Szeredi:
       "Fix copy up of security related xattrs"
      
      * tag 'ovl-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: Do not lose security.capability xattr over metadata file copy-up
        ovl: During copy up, first copy up data and then xattrs
      f88c5942
    • Linus Torvalds's avatar
      Merge tag 'fuse-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · dfee9c25
      Linus Torvalds authored
      Pull fuse updates from Miklos Szeredi:
       "Scalability and performance improvements, as well as minor bug fixes
        and cleanups"
      
      * tag 'fuse-update-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
        fuse: cache readdir calls if filesystem opts out of opendir
        fuse: support clients that don't implement 'opendir'
        fuse: lift bad inode checks into callers
        fuse: multiplex cached/direct_io file operations
        fuse add copy_file_range to direct io fops
        fuse: use iov_iter based generic splice helpers
        fuse: Switch to using async direct IO for FOPEN_DIRECT_IO
        fuse: use atomic64_t for khctr
        fuse: clean up aborted
        fuse: Protect ff->reserved_req via corresponding fi->lock
        fuse: Protect fi->nlookup with fi->lock
        fuse: Introduce fi->lock to protect write related fields
        fuse: Convert fc->attr_version into atomic64_t
        fuse: Add fuse_inode argument to fuse_prepare_release()
        fuse: Verify userspace asks to requeue interrupt that we really sent
        fuse: Do some refactoring in fuse_dev_do_write()
        fuse: Wake up req->waitq of only if not background
        fuse: Optimize request_end() by not taking fiq->waitq.lock
        fuse: Kill fasync only if interrupt is queued in queue_interrupt()
        fuse: Remove stale comment in end_requests()
        ...
      dfee9c25
    • Linus Torvalds's avatar
      Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 7b47a9e7
      Linus Torvalds authored
      Pull vfs mount infrastructure updates from Al Viro:
       "The rest of core infrastructure; no new syscalls in that pile, but the
        old parts are switched to new infrastructure. At that point
        conversions of individual filesystems can happen independently; some
        are done here (afs, cgroup, procfs, etc.), there's also a large series
        outside of that pile dealing with NFS (quite a bit of option-parsing
        stuff is getting used there - it's one of the most convoluted
        filesystems in terms of mount-related logics), but NFS bits are the
        next cycle fodder.
      
        It got seriously simplified since the last cycle; documentation is
        probably the weakest bit at the moment - I considered dropping the
        commit introducing Documentation/filesystems/mount_api.txt (cutting
        the size increase by quarter ;-), but decided that it would be better
        to fix it up after -rc1 instead.
      
        That pile allows to do followup work in independent branches, which
        should make life much easier for the next cycle. fs/super.c size
        increase is unpleasant; there's a followup series that allows to
        shrink it considerably, but I decided to leave that until the next
        cycle"
      
      * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
        afs: Use fs_context to pass parameters over automount
        afs: Add fs_context support
        vfs: Add some logging to the core users of the fs_context log
        vfs: Implement logging through fs_context
        vfs: Provide documentation for new mount API
        vfs: Remove kern_mount_data()
        hugetlbfs: Convert to fs_context
        cpuset: Use fs_context
        kernfs, sysfs, cgroup, intel_rdt: Support fs_context
        cgroup: store a reference to cgroup_ns into cgroup_fs_context
        cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
        cgroup_do_mount(): massage calling conventions
        cgroup: stash cgroup_root reference into cgroup_fs_context
        cgroup2: switch to option-by-option parsing
        cgroup1: switch to option-by-option parsing
        cgroup: take options parsing into ->parse_monolithic()
        cgroup: fold cgroup1_mount() into cgroup1_get_tree()
        cgroup: start switching to fs_context
        ipc: Convert mqueue fs to fs_context
        proc: Add fs_context support to procfs
        ...
      7b47a9e7
    • Linus Torvalds's avatar
      Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · dbc2fba3
      Linus Torvalds authored
      Pull iov_iter updates from Al Viro:
       "A couple of iov_iter patches - Christoph's crapectomy (the last
        remaining user of iov_for_each() went away with lustre, IIRC) and
        Eric'c optimization of sanity checks"
      
      * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        iov_iter: optimize page_copy_sane()
        uio: remove the unused iov_for_each macro
      dbc2fba3
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 5f739e4a
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "Assorted fixes (really no common topic here)"
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        vfs: Make __vfs_write() static
        vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
        pipe: stop using ->can_merge
        splice: don't merge into linked buffers
        fs: move generic stat response attr handling to vfs_getattr_nosec
        orangefs: don't reinitialize result_mask in ->getattr
        fs/devpts: always delete dcache dentry-s in dput()
      5f739e4a
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · a667cb7a
      Linus Torvalds authored
      Merge misc updates from Andrew Morton:
      
       - a few misc things
      
       - the rest of MM
      
      -  remove flex_arrays, replace with new simple radix-tree implementation
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits)
        Drop flex_arrays
        sctp: convert to genradix
        proc: commit to genradix
        generic radix trees
        selinux: convert to kvmalloc
        md: convert to kvmalloc
        openvswitch: convert to kvmalloc
        of: fix kmemleak crash caused by imbalance in early memory reservation
        mm: memblock: update comments and kernel-doc
        memblock: split checks whether a region should be skipped to a helper function
        memblock: remove memblock_{set,clear}_region_flags
        memblock: drop memblock_alloc_*_nopanic() variants
        memblock: memblock_alloc_try_nid: don't panic
        treewide: add checks for the return value of memblock_alloc*()
        swiotlb: add checks for the return value of memblock_alloc*()
        init/main: add checks for the return value of memblock_alloc*()
        mm/percpu: add checks for the return value of memblock_alloc*()
        sparc: add checks for the return value of memblock_alloc*()
        ia64: add checks for the return value of memblock_alloc*()
        arch: don't memset(0) memory returned by memblock_alloc()
        ...
      a667cb7a
    • Kent Overstreet's avatar
      Drop flex_arrays · 586187d7
      Kent Overstreet authored
      All existing users have been converted to generic radix trees
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-8-kent.overstreet@gmail.comSigned-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      586187d7
    • Kent Overstreet's avatar
      sctp: convert to genradix · 2075e50c
      Kent Overstreet authored
      This also makes sctp_stream_alloc_(out|in) saner, in that they no longer
      allocate new flex_arrays/genradixes, they just preallocate more
      elements.
      
      This code does however have a suspicious lack of locking.
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-7-kent.overstreet@gmail.comSigned-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2075e50c
    • Kent Overstreet's avatar
      proc: commit to genradix · 94f8f3b0
      Kent Overstreet authored
      The new generic radix trees have a simpler API and implementation, and
      no limitations on number of elements, so all flex_array users are being
      converted
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-6-kent.overstreet@gmail.comSigned-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Reviewed-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94f8f3b0
    • Kent Overstreet's avatar
      generic radix trees · ba20ba2e
      Kent Overstreet authored
      Very simple radix tree implementation that supports storing arbitrary
      size entries, up to PAGE_SIZE - upcoming patches will convert existing
      flex_array users to genradixes.  The new genradix code has a much
      simpler API and implementation, and doesn't have a hard limit on the
      number of elements like flex_array does.
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-5-kent.overstreet@gmail.comSigned-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba20ba2e
    • Kent Overstreet's avatar
      selinux: convert to kvmalloc · acdf52d9
      Kent Overstreet authored
      The flex arrays were being used for constant sized arrays, so there's no
      benefit to using flex_arrays over something simpler.
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-4-kent.overstreet@gmail.comSigned-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      acdf52d9
    • Kent Overstreet's avatar
      md: convert to kvmalloc · b330e6a4
      Kent Overstreet authored
      The code really just wants a big flat buffer, so just do that.
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-3-kent.overstreet@gmail.comSigned-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b330e6a4
    • Kent Overstreet's avatar
      openvswitch: convert to kvmalloc · ee9c5e67
      Kent Overstreet authored
      Patch series "generic radix trees; drop flex arrays".
      
      This patch (of 7):
      
      There was no real need for this code to be using flexarrays, it's just
      implementing a hash table - ideally it would be using rhashtables, but
      that conversion would be significantly more complicated.
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.comSigned-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee9c5e67
    • Mike Rapoport's avatar
      of: fix kmemleak crash caused by imbalance in early memory reservation · 5c01a25a
      Mike Rapoport authored
      Marc Gonzalez reported the following kmemleak crash:
      
        Unable to handle kernel paging request at virtual address ffffffc021e00000
        Mem abort info:
          ESR = 0x96000006
          Exception class = DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
        Data abort info:
          ISV = 0, ISS = 0x00000006
          CM = 0, WnR = 0
        swapper pgtable: 4k pages, 39-bit VAs, pgdp = (____ptrval____) [ffffffc021e00000] pgd=000000017e3ba803, pud=000000017e3ba803, pmd=0000000000000000
        Internal error: Oops: 96000006 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 6 PID: 523 Comm: kmemleak Tainted: G S      W         5.0.0-rc1 #13
        Hardware name: Qualcomm Technologies, Inc. MSM8998 v1 MTP (DT)
        pstate: 80000085 (Nzcv daIf -PAN -UAO)
        pc : scan_block+0x70/0x190
        lr : scan_block+0x6c/0x190
        Process kmemleak (pid: 523, stack limit = 0x(____ptrval____))
        Call trace:
         scan_block+0x70/0x190
         scan_gray_list+0x108/0x1c0
         kmemleak_scan+0x33c/0x7c0
         kmemleak_scan_thread+0x98/0xf0
         kthread+0x11c/0x120
         ret_from_fork+0x10/0x1c
        Code: f9000fb4 d503201f 97ffffd2 35000580 (f9400260)
      
      The crash happens when a no-map area is allocated in
      early_init_dt_alloc_reserved_memory_arch().  The allocated region is
      registered with kmemleak, but it is then removed from memblock using
      memblock_remove() that is not kmemleak-aware.
      
      Replacing memblock_phys_alloc_range() with memblock_find_in_range()
      makes sure that the allocated memory is not added to kmemleak and then
      memblock_remove()'ing this memory is safe.
      
      As a bonus, since memblock_find_in_range() ensures the allocation in the
      specified range, the bounds check can be removed.
      
      [rppt@linux.ibm.com: of: fix parameters order for call to memblock_find_in_range()]
        Link: http://lkml.kernel.org/r/20190221112619.GC32004@rapoport-lnx
      Link: http://lkml.kernel.org/r/20190213181921.GB15270@rapoport-lnx
      Fixes: 3f0c8206 ("drivers: of: add initialization code for dynamic reserved memory")
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: default avatarPrateek Patel <prpatel@nvidia.com>
      Tested-by: default avatarMarc Gonzalez <marc.w.gonzalez@free.fr>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c01a25a