1. 23 Apr, 2019 2 commits
  2. 22 Apr, 2019 1 commit
    • Brian Foster's avatar
      xfs: make tr_growdata a permanent transaction · 945c941f
      Brian Foster authored
      The growdata transaction is used by growfs operations to increase
      the data size of the filesystem. Part of this sequence involves
      extending the size of the last preexisting AG in the fs, if
      necessary. This is implemented by freeing the newly available
      physical range to the AG.
      
      tr_growdata is not a permanent transaction, however, and block
      allocation transactions must be permanent to handle deferred frees
      of AGFL blocks. If the grow operation extends an existing AG that
      requires AGFL fixing, assert failures occur due to a populated dfops
      list on a non-permanent transaction and the AGFL free does not
      occur. This is reproduced (rarely) by xfs/104.
      
      Change tr_growdata to a permanent transaction with a default log
      count. This increases initial transaction reservation size, but
      growfs is an infrequent and non-performance critical operation and
      so should have minimal impact.
      Reported-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: add a comment to the assert]
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      945c941f
  3. 16 Apr, 2019 8 commits
    • Darrick J. Wong's avatar
      xfs: merge adjacent io completions of the same type · 3994fc48
      Darrick J. Wong authored
      It's possible for pagecache writeback to split up a large amount of work
      into smaller pieces for throttling purposes or to reduce the amount of
      time a writeback operation is pending.  Whatever the reason, XFS can end
      up with a bunch of IO completions that call for the same operation to be
      performed on a contiguous extent mapping.  Since mappings are extent
      based in XFS, we'd prefer to run fewer transactions when we can.
      
      When we're processing an ioend on the list of io completions, check to
      see if the next items on the list are both adjacent and of the same
      type.  If so, we can merge the completions to reduce transaction
      overhead.
      
      On fast storage this doesn't seem to make much of a difference in
      performance, though the number of transactions for an overnight xfstests
      run seems to drop by ~5%.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      3994fc48
    • Darrick J. Wong's avatar
      xfs: remove unused m_data_workqueue · 28408243
      Darrick J. Wong authored
      Now that we're no longer using m_data_workqueue, remove it.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      28408243
    • Darrick J. Wong's avatar
      xfs: implement per-inode writeback completion queues · cb357bf3
      Darrick J. Wong authored
      When scheduling writeback of dirty file data in the page cache, XFS uses
      IO completion workqueue items to ensure that filesystem metadata only
      updates after the write completes successfully.  This is essential for
      converting unwritten extents to real extents at the right time and
      performing COW remappings.
      
      Unfortunately, XFS queues each IO completion work item to an unbounded
      workqueue, which means that the kernel can spawn dozens of threads to
      try to handle the items quickly.  These threads need to take the ILOCK
      to update file metadata, which results in heavy ILOCK contention if a
      large number of the work items target a single file, which is
      inefficient.
      
      Worse yet, the writeback completion threads get stuck waiting for the
      ILOCK while holding transaction reservations, which can use up all
      available log reservation space.  When that happens, metadata updates to
      other parts of the filesystem grind to a halt, even if the filesystem
      could otherwise have handled it.
      
      Even worse, if one of the things grinding to a halt happens to be a
      thread in the middle of a defer-ops finish holding the same ILOCK and
      trying to obtain more log reservation having exhausted the permanent
      reservation, we now have an ABBA deadlock - writeback completion has a
      transaction reserved and wants the ILOCK, and someone else has the ILOCK
      and wants a transaction reservation.
      
      Therefore, we create a per-inode writeback io completion queue + work
      item.  When writeback finishes, it can add the ioend to the per-inode
      queue and let the single worker item process that queue.  This
      dramatically cuts down on the number of kworkers and ILOCK contention in
      the system, and seems to have eliminated an occasional deadlock I was
      seeing while running generic/476.
      
      Testing with a program that simulates a heavy random-write workload to a
      single file demonstrates that the number of kworkers drops from
      approximately 120 threads per file to 1, without dramatically changing
      write bandwidth or pagecache access latency.
      
      Note that we leave the xfs-conv workqueue's max_active alone because we
      still want to be able to run ioend processing for as many inodes as the
      system can handle.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      cb357bf3
    • Darrick J. Wong's avatar
      xfs: scrub should only cross-reference with healthy btrees · 4fb7951f
      Darrick J. Wong authored
      Skip cross-referencing with a btree if the health report tells us that
      it's known to be bad.  This should reduce the dmesg spew considerably.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      4fb7951f
    • Darrick J. Wong's avatar
      xfs: scrub/repair should update filesystem metadata health · 4860a05d
      Darrick J. Wong authored
      Now that we have the ability to track sick metadata in-core, make scrub
      and repair update those health assessments after doing work.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      4860a05d
    • Darrick J. Wong's avatar
      xfs: hoist the already_fixed variable to the scrub context · 160b5a78
      Darrick J. Wong authored
      Now that we no longer memset the scrub context, we can move the
      already_fixed variable into the scrub context's state flags instead of
      passing around pointers to separate stack variables.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      160b5a78
    • Darrick J. Wong's avatar
      xfs: collapse scrub bool state flags into a single unsigned int · f8c2a225
      Darrick J. Wong authored
      Combine all the boolean state flags in struct xfs_scrub into a single
      unsigned int, because we're going to be adding more state flags soon.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      f8c2a225
    • Darrick J. Wong's avatar
      xfs: refactor scrub context initialization · 9d71e155
      Darrick J. Wong authored
      It's a little silly how the memset in scrub context initialization
      forces us to declare stack variables to preserve context variables
      across a retry.  Since the teardown functions already null out most of
      the ephemeral state (buffer pointers, btree cursors, etc.), just skip
      the memset and move the initialization as needed.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      9d71e155
  4. 15 Apr, 2019 13 commits
    • Darrick J. Wong's avatar
      xfs: report inode health via bulkstat · 89d139d5
      Darrick J. Wong authored
      Use space in the bulkstat ioctl structure to report any problems
      observed with the inode.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      89d139d5
    • Darrick J. Wong's avatar
      xfs: report AG health via AG geometry ioctl · 1302c6a2
      Darrick J. Wong authored
      Use the AG geometry info ioctl to report health status too.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      1302c6a2
    • Darrick J. Wong's avatar
      xfs: report fs and rt health via geometry structure · c23232d4
      Darrick J. Wong authored
      Use our newly expanded geometry structure to report the overall fs and
      realtime health status.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      c23232d4
    • Darrick J. Wong's avatar
      xfs: add a new ioctl to describe allocation group geometry · 7cd5006b
      Darrick J. Wong authored
      Add a new ioctl to describe an allocation group's geometry.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      7cd5006b
    • Dave Chinner's avatar
      xfs: bump XFS_IOC_FSGEOMETRY to v5 structures · 1b6d968d
      Dave Chinner authored
      Unfortunately, the V4 XFS_IOC_FSGEOMETRY structure is out of space so we
      can't just add a new field to it. Hence we need to bump the definition
      to V5 and and treat the V4 ioctl and structure similar to v1 to v3.
      
      While doing this, clean up all the definitions associated with the
      XFS_IOC_FSGEOMETRY ioctl.
      Signed-Off-By: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: forward port to 5.1, expand structure size to 256 bytes]
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      1b6d968d
    • Darrick J. Wong's avatar
      xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem · 519841c2
      Darrick J. Wong authored
      If we know the filesystem metadata isn't healthy during unmount, we want
      to encourage the administrator to run xfs_repair right away.  We can't
      do this if BAD_SUMMARY will cause an unclean log unmount to force
      summary recalculation, so turn it off if the fs is bad.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      519841c2
    • Darrick J. Wong's avatar
      xfs: replace the BAD_SUMMARY mount flag with the equivalent health code · 39353ff6
      Darrick J. Wong authored
      Replace the BAD_SUMMARY mount flag with calls to the equivalent health
      tracking code.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      39353ff6
    • Darrick J. Wong's avatar
      xfs: track metadata health status · 6772c1f1
      Darrick J. Wong authored
      Add the necessary in-core metadata fields to keep track of which parts
      of the filesystem have been observed and which parts were observed to be
      unhealthy, and print a warning at unmount time if we have unfixed
      problems.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      6772c1f1
    • Wang Shilong's avatar
      xfs,fstrim: fix to return correct minlen · 2bf9d264
      Wang Shilong authored
      This patch tries to address two problems:
      
      1) return @minlen we used to trim to
      user space.
      
      2) return EINVAL if granularity is larger than
      avg size, even most of cases, granularity is small(4K),
      but if devices return a lager granularity for some reaons
      (testing, bugs etc), fstrim should return failure directly.
      Signed-off-by: default avatarWang Shilong <wshilong@ddn.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      2bf9d264
    • Brian Foster's avatar
      xfs: don't account extra agfl blocks as available · 1ca89fbc
      Brian Foster authored
      The block allocation AG selection code has parameters that allow a
      caller to perform multiple allocations from a single AG and
      transaction (under certain conditions). The parameters specify the
      total block allocation count required by the transaction and the AG
      selection code selects and locks an AG that will be able to satisfy
      the overall requirement. If the available block accounting
      calculation turns out to be inaccurate and a subsequent allocation
      call fails with -ENOSPC, the resulting transaction cancel leads to
      filesystem shutdown because the transaction is dirty.
      
      This exact problem can be reproduced with a highly parallel space
      consumer and fsstress workload running long enough to a large
      filesystem against -ENOSPC conditions. A bmbt block allocation
      request made for inode extent to bmap format conversion after an
      extent allocation is expected to be satisfied by the same AG and the
      same transaction as the extent allocation. The bmbt block allocation
      fails, however, because the block availability of the AG has changed
      since the AG was selected (outside of the blocks used for the extent
      itself).
      
      The inconsistent block availability calculation is caused by the
      deferred block freeing behavior of the AGFL. This immediately
      removes extra blocks from the AGFL to free up AGFL slots, but rather
      than immediately freeing such blocks as was done in the past, the
      block free is deferred such that said blocks are not available for
      allocation until the current transaction commits. The AG selection
      logic currently considers all AGFL blocks as available and executes
      shortly before any extra AGFL blocks are freed. This means the block
      availability of the current AG can change before the first
      allocation even occurs, but in practice a failure is more likely to
      manifest via a subsequent allocation because extent allocation
      usually has a contiguity requirement larger than a single block that
      can't be satisfied from the AGFL.
      
      In general, XFS prefers operational robustness to absolute
      allocation efficiency. In other words, we prefer to return -ENOSPC
      slightly earlier at the expense of not being able to allocate every
      last block in an AG to avoid this kind of problem. As such, update
      the AG block availability calculation to consider extra AGFL blocks
      as unavailable since they are immediately removed following the
      calculation and will not become available until the current
      transaction commits.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      1ca89fbc
    • Brian Foster's avatar
      xfs: shutdown after buf release in iflush cluster abort path · 22fedd80
      Brian Foster authored
      If xfs_iflush_cluster() fails due to corruption, the error path
      issues a shutdown and simulates an I/O completion to release the
      buffer. This code has a couple small problems. First, the shutdown
      sequence can issue a synchronous log force, which is unsafe to do
      with buffer locks held. Second, the simulated I/O completion does not
      guarantee the buffer is async and thus is unlocked and released.
      
      For example, if the last operation on the buffer was a read off disk
      prior to the corruption event, XBF_ASYNC is not set and the buffer
      is left locked and held upon return. This results in a memory leak
      as shown by the following message on module unload:
      
       BUG xfs_buf (...): Objects remaining in xfs_buf on __kmem_cache_shutdown()
      
      Fix both of these problems by setting XBF_ASYNC on the buffer prior
      to the simulated I/O error and performing the shutdown immediately
      after ioend processing when the buffer has been released.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      22fedd80
    • Brian Foster's avatar
      xfs: wake commit waiters on CIL abort before log item abort · 545aa41f
      Brian Foster authored
      XFS shutdown deadlocks have been reproduced by fstest generic/475.
      The deadlock signature involves log I/O completion running error
      handling to abort logged items and waiting for an inode cluster
      buffer lock in the buffer item unpin handler. The buffer lock is
      held by xfsaild attempting to flush an inode. The buffer happens to
      be pinned and so xfs_iflush() triggers an async log force to begin
      work required to get it unpinned. The log force is blocked waiting
      on the commit completion, which never occurs and thus leaves the
      filesystem deadlocked.
      
      The root problem is that aborted log I/O completion pots commit
      completion behind callback completion, which is unexpected for async
      log forces. Under normal running conditions, an async log force
      returns to the caller once the CIL ctx has been formatted/submitted
      and the commit completion event triggered at the tail end of
      xlog_cil_push(). If the filesystem has shutdown, however, we rely on
      xlog_cil_committed() to trigger the completion event and it happens
      to do so after running log item unpin callbacks. This makes it
      unsafe to invoke an async log force from contexts that hold locks
      that might also be required in log completion processing.
      
      To address this problem, wake commit completion waiters before
      aborting log items in the log I/O completion handler. This ensures
      that an async log force will not deadlock on held locks if the
      filesystem happens to shutdown. Note that it is still unsafe to
      issue a sync log force while holding such locks because a sync log
      force explicitly waits on the force completion, which occurs after
      log I/O completion processing.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      545aa41f
    • Brian Foster's avatar
      xfs: fix use after free in buf log item unlock assert · 4d09807f
      Brian Foster authored
      The xfs_buf_log_item ->iop_unlock() callback asserts that the buffer
      is unlocked when either non-stale or aborted. This assert occurs
      after the bli refcount has been dropped and the log item potentially
      freed. The aborted check is thus a potential use after free. This
      problem has been reproduced with KASAN enabled via generic/475.
      
      Fix up xfs_buf_item_unlock() to query aborted state before the bli
      reference is dropped to prevent a potential use after free.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      4d09807f
  5. 14 Apr, 2019 6 commits
    • Linus Torvalds's avatar
      Linux 5.1-rc5 · dc4060a5
      Linus Torvalds authored
      dc4060a5
    • Linus Torvalds's avatar
      Merge branch 'page-refs' (page ref overflow) · 6b3a7077
      Linus Torvalds authored
      Merge page ref overflow branch.
      
      Jann Horn reported that he can overflow the page ref count with
      sufficient memory (and a filesystem that is intentionally extremely
      slow).
      
      Admittedly it's not exactly easy.  To have more than four billion
      references to a page requires a minimum of 32GB of kernel memory just
      for the pointers to the pages, much less any metadata to keep track of
      those pointers.  Jann needed a total of 140GB of memory and a specially
      crafted filesystem that leaves all reads pending (in order to not ever
      free the page references and just keep adding more).
      
      Still, we have a fairly straightforward way to limit the two obvious
      user-controllable sources of page references: direct-IO like page
      references gotten through get_user_pages(), and the splice pipe page
      duplication.  So let's just do that.
      
      * branch page-refs:
        fs: prevent page refcount overflow in pipe_buf_get
        mm: prevent get_user_pages() from overflowing page refcount
        mm: add 'try_get_page()' helper function
        mm: make page ref count overflow check tighter and more explicit
      6b3a7077
    • Matthew Wilcox's avatar
      fs: prevent page refcount overflow in pipe_buf_get · 15fab63e
      Matthew Wilcox authored
      Change pipe_buf_get() to return a bool indicating whether it succeeded
      in raising the refcount of the page (if the thing in the pipe is a page).
      This removes another mechanism for overflowing the page refcount.  All
      callers converted to handle a failure.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      15fab63e
    • Linus Torvalds's avatar
      mm: prevent get_user_pages() from overflowing page refcount · 8fde12ca
      Linus Torvalds authored
      If the page refcount wraps around past zero, it will be freed while
      there are still four billion references to it.  One of the possible
      avenues for an attacker to try to make this happen is by doing direct IO
      on a page multiple times.  This patch makes get_user_pages() refuse to
      take a new page reference if there are already more than two billion
      references to the page.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fde12ca
    • Linus Torvalds's avatar
      mm: add 'try_get_page()' helper function · 88b1a17d
      Linus Torvalds authored
      This is the same as the traditional 'get_page()' function, but instead
      of unconditionally incrementing the reference count of the page, it only
      does so if the count was "safe".  It returns whether the reference count
      was incremented (and is marked __must_check, since the caller obviously
      has to be aware of it).
      
      Also like 'get_page()', you can't use this function unless you already
      had a reference to the page.  The intent is that you can use this
      exactly like get_page(), but in situations where you want to limit the
      maximum reference count.
      
      The code currently does an unconditional WARN_ON_ONCE() if we ever hit
      the reference count issues (either zero or negative), as a notification
      that the conditional non-increment actually happened.
      
      NOTE! The count access for the "safety" check is inherently racy, but
      that doesn't matter since the buffer we use is basically half the range
      of the reference count (ie we look at the sign of the count).
      Acked-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88b1a17d
    • Linus Torvalds's avatar
      mm: make page ref count overflow check tighter and more explicit · f958d7b5
      Linus Torvalds authored
      We have a VM_BUG_ON() to check that the page reference count doesn't
      underflow (or get close to overflow) by checking the sign of the count.
      
      That's all fine, but we actually want to allow people to use a "get page
      ref unless it's already very high" helper function, and we want that one
      to use the sign of the page ref (without triggering this VM_BUG_ON).
      
      Change the VM_BUG_ON to only check for small underflows (or _very_ close
      to overflowing), and ignore overflows which have strayed into negative
      territory.
      Acked-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f958d7b5
  6. 13 Apr, 2019 10 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-block · 4443f8e6
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Set of fixes that should go into this round. This pull is larger than
        I'd like at this time, but there's really no specific reason for that.
        Some are fixes for issues that went into this merge window, others are
        not. Anyway, this contains:
      
         - Hardware queue limiting for virtio-blk/scsi (Dongli)
      
         - Multi-page bvec fixes for lightnvm pblk
      
         - Multi-bio dio error fix (Jason)
      
         - Remove the cache hint from the io_uring tool side, since we didn't
           move forward with that (me)
      
         - Make io_uring SETUP_SQPOLL root restricted (me)
      
         - Fix leak of page in error handling for pc requests (Jérôme)
      
         - Fix BFQ regression introduced in this merge window (Paolo)
      
         - Fix break logic for bio segment iteration (Ming)
      
         - Fix NVMe cancel request error handling (Ming)
      
         - NVMe pull request with two fixes (Christoph):
             - fix the initial CSN for nvme-fc (James)
             - handle log page offsets properly in the target (Keith)"
      
      * tag 'for-linus-20190412' of git://git.kernel.dk/linux-block:
        block: fix the return errno for direct IO
        nvmet: fix discover log page when offsets are used
        nvme-fc: correct csn initialization and increments on error
        block: do not leak memory in bio_copy_user_iov()
        lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs
        nvme: cancel request synchronously
        blk-mq: introduce blk_mq_complete_request_sync()
        scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
        virtio-blk: limit number of hw queues by nr_cpu_ids
        block, bfq: fix use after free in bfq_bfqq_expire
        io_uring: restrict IORING_SETUP_SQPOLL to root
        tools/io_uring: remove IOCQE_FLAG_CACHEHIT
        block: don't use for-inside-for in bio_for_each_segment_all
      4443f8e6
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · b60bc066
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
        Stable fix:
      
         - Fix a deadlock in close() due to incorrect draining of RDMA queues
      
        Bugfixes:
      
         - Revert "SUNRPC: Micro-optimise when the task is known not to be
           sleeping" as it is causing stack overflows
      
         - Fix a regression where NFSv4 getacl and fs_locations stopped
           working
      
         - Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
      
         - Fix xfstests failures due to incorrect copy_file_range() return
           values"
      
      * tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
        NFSv4.1 fix incorrect return value in copy_file_range
        xprtrdma: Fix helper that drains the transport
        NFS: Fix handling of reply page vector
        NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
      b60bc066
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 87af0c38
      Linus Torvalds authored
      Pull SCSI fix from James Bottomley:
       "One obvious fix for a ciostor data corruption on error bug"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
      87af0c38
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 09bad0df
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Here's more than a handful of clk driver fixes for changes that came
        in during the merge window:
      
         - Fix the AT91 sama5d2 programmable clk prescaler formula
      
         - A bunch of Amlogic meson clk driver fixes for the VPU clks
      
         - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc
           clks as critical only when really needed
      
         - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
           implementation
      
         - Use the right structure to test for a frequency table in i.MX's
           PLL_1416x driver"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: imx: Fix PLL_1416X not rounding rates
        clk: mediatek: fix clk-gate flag setting
        platform/x86: pmc_atom: Drop __initconst on dmi table
        clk: x86: Add system specific quirk to mark clocks as critical
        clk: meson: vid-pll-div: remove warning and return 0 on invalid config
        clk: meson: pll: fix rounding and setting a rate that matches precisely
        clk: meson-g12a: fix VPU clock parents
        clk: meson: g12a: fix VPU clock muxes mask
        clk: meson-gxbb: round the vdec dividers to closest
        clk: at91: fix programmable clock for sama5d2
      09bad0df
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · a3b84248
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - Add a DMA alias quirk for another Marvell SATA device (Andre
         Przywara)
      
       - Fix a pciehp regression that broke safe removal of devices (Sergey
         Miroshnichenko)
      
      * tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: pciehp: Ignore Link State Changes after powering off a slot
        PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
      a3b84248
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · cf60528f
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "A minor build fix for 64-bit FLATMEM configs.
      
        A fix for a boot failure on 32-bit powermacs.
      
        My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on
        64-bit kernels, ie. compat mode, which is only used on big endian.
      
        The rewrite of the SLB code we merged in 4.20 missed the fact that the
        0x380 exception is also used with the Radix MMU to report out of range
        accesses. This could lead to an oops if userspace tried to read from
        addresses outside the user or kernel range.
      
        Thanks to: Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas
        Piggin"
      
      * tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs
        powerpc/64s/radix: Fix radix segment exception handling
        powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
        powerpc/32: Fix early boot failure with RTAS built-in
      cf60528f
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 5ded8871
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The main thing is a fix to our FUTEX_WAKE_OP implementation which was
        unbelievably broken, but did actually work for the one scenario that
        GLIBC used to use.
      
        Summary:
      
         - Fix stack unwinding so we ignore user stacks
      
         - Fix ftrace module PLT trampoline initialisation checks
      
         - Fix terminally broken implementation of FUTEX_WAKE_OP atomics"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
        arm64: backtrace: Don't bother trying to unwind the userspace stack
        arm64/ftrace: fix inadvertent BUG() in trampoline check
      5ded8871
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6d0a5984
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Fix typos in user-visible resctrl parameters, and also fix assembly
        constraint bugs that might result in miscompilation"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm: Use stricter assembly constraints in bitops
        x86/resctrl: Fix typos in the mba_sc mount option
      6d0a5984
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 122c215b
      Linus Torvalds authored
      Pull timer fix from Ingo Molnar:
       "Fix the alarm_timer_remaining() return value"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        alarmtimer: Return correct remaining time
      122c215b
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5e6f1fee
      Linus Torvalds authored
      Pull scheduler fix from Ingo Molnar:
       "Fix a NULL pointer dereference crash in certain environments"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
      5e6f1fee