1. 29 Jul, 2021 12 commits
    • Darrick J. Wong's avatar
      xfs: prevent spoofing of rtbitmap blocks when recovering buffers · 81a448d7
      Darrick J. Wong authored
      While reviewing the buffer item recovery code, the thought occurred to
      me: in V5 filesystems we use log sequence number (LSN) tracking to avoid
      replaying older metadata updates against newer log items.  However, we
      use the magic number of the ondisk buffer to find the LSN of the ondisk
      metadata, which means that if an attacker can control the layout of the
      realtime device precisely enough that the start of an rt bitmap block
      matches the magic and UUID of some other kind of block, they can control
      the purported LSN of that spoofed block and thereby break log replay.
      
      Since realtime bitmap and summary blocks don't have headers at all, we
      have no way to tell if a block really should be replayed.  The best we
      can do is replay unconditionally and hope for the best.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      81a448d7
    • Dave Chinner's avatar
      xfs: limit iclog tail updates · 9d110014
      Dave Chinner authored
      From the department of "generic/482 keeps on giving", we bring you
      another tail update race condition:
      
      iclog:
      	S1			C1
      	+-----------------------+-----------------------+
      				 S2			EOIC
      
      Two checkpoints in a single iclog. One is complete, the other just
      contains the start record and overruns into a new iclog.
      
      Timeline:
      
      Before S1:	Cache flush, log tail = X
      At S1:		Metadata stable, write start record and checkpoint
      At C1:		Write commit record, set NEED_FUA
      		Single iclog checkpoint, so no need for NEED_FLUSH
      		Log tail still = X, so no need for NEED_FLUSH
      
      After C1,
      Before S2:	Cache flush, log tail = X
      At S2:		Metadata stable, write start record and checkpoint
      After S2:	Log tail moves to X+1
      At EOIC:	End of iclog, more journal data to write
      		Releases iclog
      		Not a commit iclog, so no need for NEED_FLUSH
      		Writes log tail X+1 into iclog.
      
      At this point, the iclog has tail X+1 and NEED_FUA set. There has
      been no cache flush for the metadata between X and X+1, and the
      iclog writes the new tail permanently to the log. THis is sufficient
      to violate on disk metadata/journal ordering.
      
      We have two options here. The first is to detect this case in some
      manner and ensure that the partial checkpoint write sets NEED_FLUSH
      when the iclog is already marked NEED_FUA and the log tail changes.
      This seems somewhat fragile and quite complex to get right, and it
      doesn't actually make it obvious what underlying problem it is
      actually addressing from reading the code.
      
      The second option seems much cleaner to me, because it is derived
      directly from the requirements of the C1 commit record in the iclog.
      That is, when we write this commit record to the iclog, we've
      guaranteed that the metadata/data ordering is correct for tail
      update purposes. Hence if we only write the log tail into the iclog
      for the *first* commit record rather than the log tail at the last
      release, we guarantee that the log tail does not move past where the
      the first commit record in the log expects it to be.
      
      IOWs, taking the first option means that replay of C1 becomes
      dependent on future operations doing the right thing, not just the
      C1 checkpoint itself doing the right thing. This makes log recovery
      almost impossible to reason about because now we have to take into
      account what might or might not have happened in the future when
      looking at checkpoints in the log rather than just having to
      reconstruct the past...
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      9d110014
    • Dave Chinner's avatar
      xfs: need to see iclog flags in tracing · b2ae3a9e
      Dave Chinner authored
      Because I cannot tell if the NEED_FLUSH flag is being set correctly
      by the log force and CIL push machinery without it.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      b2ae3a9e
    • Dave Chinner's avatar
      xfs: Enforce attr3 buffer recovery order · d8f4c2d0
      Dave Chinner authored
      From the department of "WTAF? How did we miss that!?"...
      
      When we are recovering a buffer, the first thing we do is check the
      buffer magic number and extract the LSN from the buffer. If the LSN
      is older than the current LSN, we replay the modification to it. If
      the metadata on disk is newer than the transaction in the log, we
      skip it. This is a fundamental v5 filesystem metadata recovery
      behaviour.
      
      generic/482 failed with an attribute writeback failure during log
      recovery. The write verifier caught the corruption before it got
      written to disk, and the attr buffer dump looked like:
      
      XFS (dm-3): Metadata corruption detected at xfs_attr3_leaf_verify+0x275/0x2e0, xfs_attr3_leaf block 0x19be8
      XFS (dm-3): Unmount and run xfs_repair
      XFS (dm-3): First 128 bytes of corrupted metadata buffer:
      00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 4d 2a 01 e1  ........;...M*..
      00000010: 00 00 00 00 00 01 9b e8 00 00 00 01 00 00 05 38  ...............8
                                        ^^^^^^^^^^^^^^^^^^^^^^^
      00000020: df 39 5e 51 58 ac 44 b6 8d c5 e7 10 44 09 bc 17  .9^QX.D.....D...
      00000030: 00 00 00 00 00 02 00 83 00 03 00 cc 0f 24 01 00  .............$..
      00000040: 00 68 0e bc 0f c8 00 10 00 00 00 00 00 00 00 00  .h..............
      00000050: 00 00 3c 31 0f 24 01 00 00 00 3c 32 0f 88 01 00  ..<1.$....<2....
      00000060: 00 00 3c 33 0f d8 01 00 00 00 00 00 00 00 00 00  ..<3............
      00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      .....
      
      The highlighted bytes are the LSN that was replayed into the
      buffer: 0x100000538. This is cycle 1, block 0x538. Prior to replay,
      that block on disk looks like this:
      
      $ sudo xfs_db -c "fsb 0x417d" -c "type attr3" -c p /dev/mapper/thin-vol
      hdr.info.hdr.forw = 0
      hdr.info.hdr.back = 0
      hdr.info.hdr.magic = 0x3bee
      hdr.info.crc = 0xb5af0bc6 (correct)
      hdr.info.bno = 105448
      hdr.info.lsn = 0x100000900
                     ^^^^^^^^^^^
      hdr.info.uuid = df395e51-58ac-44b6-8dc5-e7104409bc17
      hdr.info.owner = 131203
      hdr.count = 2
      hdr.usedbytes = 120
      hdr.firstused = 3796
      hdr.holes = 1
      hdr.freemap[0-2] = [base,size]
      
      Note the LSN stamped into the buffer on disk: 1/0x900. The version
      on disk is much newer than the log transaction that was being
      replayed. That's a bug, and should -never- happen.
      
      So I immediately went to look at xlog_recover_get_buf_lsn() to check
      that we handled the LSN correctly. I was wondering if there was a
      similar "two commits with the same start LSN skips the second
      replay" problem with buffers. I didn't get that far, because I found
      a much more basic, rudimentary bug: xlog_recover_get_buf_lsn()
      doesn't recognise buffers with XFS_ATTR3_LEAF_MAGIC set in them!!!
      
      IOWs, attr3 leaf buffers fall through the magic number checks
      unrecognised, so trigger the "recover immediately" behaviour instead
      of undergoing an LSN check. IOWs, we incorrectly replay ATTR3 leaf
      buffers and that causes silent on disk corruption of inode attribute
      forks and potentially other things....
      
      Git history shows this is *another* zero day bug, this time
      introduced in commit 50d5c8d8 ("xfs: check LSN ordering for v5
      superblocks during recovery") which failed to handle the attr3 leaf
      buffers in recovery. And we've failed to handle them ever since...
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      d8f4c2d0
    • Dave Chinner's avatar
      xfs: logging the on disk inode LSN can make it go backwards · 32baa63d
      Dave Chinner authored
      When we log an inode, we format the "log inode" core and set an LSN
      in that inode core. We do that via xfs_inode_item_format_core(),
      which calls:
      
      	xfs_inode_to_log_dinode(ip, dic, ip->i_itemp->ili_item.li_lsn);
      
      to format the log inode. It writes the LSN from the inode item into
      the log inode, and if recovery decides the inode item needs to be
      replayed, it recovers the log inode LSN field and writes it into the
      on disk inode LSN field.
      
      Now this might seem like a reasonable thing to do, but it is wrong
      on multiple levels. Firstly, if the item is not yet in the AIL,
      item->li_lsn is zero. i.e. the first time the inode it is logged and
      formatted, the LSN we write into the log inode will be zero. If we
      only log it once, recovery will run and can write this zero LSN into
      the inode.
      
      This means that the next time the inode is logged and log recovery
      runs, it will *always* replay changes to the inode regardless of
      whether the inode is newer on disk than the version in the log and
      that violates the entire purpose of recording the LSN in the inode
      at writeback time (i.e. to stop it going backwards in time on disk
      during recovery).
      
      Secondly, if we commit the CIL to the journal so the inode item
      moves to the AIL, and then relog the inode, the LSN that gets
      stamped into the log inode will be the LSN of the inode's current
      location in the AIL, not it's age on disk. And it's not the LSN that
      will be associated with the current change. That means when log
      recovery replays this inode item, the LSN that ends up on disk is
      the LSN for the previous changes in the log, not the current
      changes being replayed. IOWs, after recovery the LSN on disk is not
      in sync with the LSN of the modifications that were replayed into
      the inode. This, again, violates the recovery ordering semantics
      that on-disk writeback LSNs provide.
      
      Hence the inode LSN in the log dinode is -always- invalid.
      
      Thirdly, recovery actually has the LSN of the log transaction it is
      replaying right at hand - it uses it to determine if it should
      replay the inode by comparing it to the on-disk inode's LSN. But it
      doesn't use that LSN to stamp the LSN into the inode which will be
      written back when the transaction is fully replayed. It uses the one
      in the log dinode, which we know is always going to be incorrect.
      
      Looking back at the change history, the inode logging was broken by
      commit 93f958f9 ("xfs: cull unnecessary icdinode fields") way
      back in 2016 by a stupid idiot who thought he knew how this code
      worked. i.e. me. That commit replaced an in memory di_lsn field that
      was updated only at inode writeback time from the inode item.li_lsn
      value - and hence always contained the same LSN that appeared in the
      on-disk inode - with a read of the inode item LSN at inode format
      time. CLearly these are not the same thing.
      
      Before 93f958f9, the log recovery behaviour was irrelevant,
      because the LSN in the log inode always matched the on-disk LSN at
      the time the inode was logged, hence recovery of the transaction
      would never make the on-disk LSN in the inode go backwards or get
      out of sync.
      
      A symptom of the problem is this, caught from a failure of
      generic/482. Before log recovery, the inode has been allocated but
      never used:
      
      xfs_db> inode 393388
      xfs_db> p
      core.magic = 0x494e
      core.mode = 0
      ....
      v3.crc = 0x99126961 (correct)
      v3.change_count = 0
      v3.lsn = 0
      v3.flags2 = 0
      v3.cowextsize = 0
      v3.crtime.sec = Thu Jan  1 10:00:00 1970
      v3.crtime.nsec = 0
      
      After log recovery:
      
      xfs_db> p
      core.magic = 0x494e
      core.mode = 020444
      ....
      v3.crc = 0x23e68f23 (correct)
      v3.change_count = 2
      v3.lsn = 0
      v3.flags2 = 0
      v3.cowextsize = 0
      v3.crtime.sec = Thu Jul 22 17:03:03 2021
      v3.crtime.nsec = 751000000
      ...
      
      You can see that the LSN of the on-disk inode is 0, even though it
      clearly has been written to disk. I point out this inode, because
      the generic/482 failure occurred because several adjacent inodes in
      this specific inode cluster were not replayed correctly and still
      appeared to be zero on disk when all the other metadata (inobt,
      finobt, directories, etc) indicated they should be allocated and
      written back.
      
      The fix for this is two-fold. The first is that we need to either
      revert the LSN changes in 93f958f9 or stop logging the inode LSN
      altogether. If we do the former, log recovery does not need to
      change but we add 8 bytes of memory per inode to store what is
      largely a write-only inode field. If we do the latter, log recovery
      needs to stamp the on-disk inode in the same manner that inode
      writeback does.
      
      I prefer the latter, because we shouldn't really be trying to log
      and replay changes to the on disk LSN as the on-disk value is the
      canonical source of the on-disk version of the inode. It also
      matches the way we recover buffer items - we create a buf_log_item
      that carries the current recovery transaction LSN that gets stamped
      into the buffer by the write verifier when it gets written back
      when the transaction is fully recovered.
      
      However, this might break log recovery on older kernels even more,
      so I'm going to simply ignore the logged value in recovery and stamp
      the on-disk inode with the LSN of the transaction being recovered
      that will trigger writeback on transaction recovery completion. This
      will ensure that the on-disk inode LSN always reflects the LSN of
      the last change that was written to disk, regardless of whether it
      comes from log recovery or runtime writeback.
      
      Fixes: 93f958f9 ("xfs: cull unnecessary icdinode fields")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      32baa63d
    • Dave Chinner's avatar
      xfs: avoid unnecessary waits in xfs_log_force_lsn() · 8191d822
      Dave Chinner authored
      Before waiting on a iclog in xfs_log_force_lsn(), we don't check to
      see if the iclog has already been completed and the contents on
      stable storage. We check for completed iclogs in xfs_log_force(), so
      we should do the same thing for xfs_log_force_lsn().
      
      This fixed some random up-to-30s pauses seen in unmounting
      filesystems in some tests. A log force ends up waiting on completed
      iclog, and that doesn't then get flushed (and hence the log force
      get completed) until the background log worker issues a log force
      that flushes the iclog in question. Then the unmount unblocks and
      continues.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      8191d822
    • Dave Chinner's avatar
      xfs: log forces imply data device cache flushes · 2bf1ec0f
      Dave Chinner authored
      After fixing the tail_lsn vs cache flush race, generic/482 continued
      to fail in a similar way where cache flushes were missing before
      iclog FUA writes. Tracing of iclog state changes during the fsstress
      workload portion of the test (via xlog_iclog* events) indicated that
      iclog writes were coming from two sources - CIL pushes and log
      forces (due to fsync/O_SYNC operations). All of the cases where a
      recovery problem was triggered indicated that the log force was the
      source of the iclog write that was not preceeded by a cache flush.
      
      This was an oversight in the modifications made in commit
      eef983ff ("xfs: journal IO cache flush reductions"). Log forces
      for fsync imply a data device cache flush has been issued if an
      iclog was flushed to disk and is indicated to the caller via the
      log_flushed parameter so they can elide the device cache flush if
      the journal issued one.
      
      The change in eef983ff results in iclogs only issuing a cache
      flush if XLOG_ICL_NEED_FLUSH is set on the iclog, but this was not
      added to the iclogs that the log force code flushes to disk. Hence
      log forces are no longer guaranteeing that a cache flush is issued,
      hence opening up a potential on-disk ordering failure.
      
      Log forces should also set XLOG_ICL_NEED_FUA as well to ensure that
      the actual iclogs it forces to the journal are also on stable
      storage before it returns to the caller.
      
      This patch introduces the xlog_force_iclog() helper function to
      encapsulate the process of taking a reference to an iclog, switching
      its state if WANT_SYNC and flushing it to stable storage correctly.
      
      Both xfs_log_force() and xfs_log_force_lsn() are converted to use
      it, as is xlog_unmount_write() which has an elaborate method of
      doing exactly the same "write this iclog to stable storage"
      operation.
      
      Further, if the log force code needs to wait on a iclog in the
      WANT_SYNC state, it needs to ensure that iclog also results in a
      cache flush being issued. This covers the case where the iclog
      contains the commit record of the CIL flush that the log force
      triggered, but it hasn't been written yet because there is still an
      active reference to the iclog.
      
      Note: this whole cache flush whack-a-mole patch is a result of log
      forces still being iclog state centric rather than being CIL
      sequence centric. Most of this nasty code will go away in future
      when log forces are converted to wait on CIL sequence push
      completion rather than iclog completion. With the CIL push algorithm
      guaranteeing that the CIL checkpoint is fully on stable storage when
      it completes, we no longer need to iterate iclogs and push them to
      ensure a CIL sequence push has completed and so all this nasty iclog
      iteration and flushing code will go away.
      
      Fixes: eef983ff ("xfs: journal IO cache flush reductions")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      2bf1ec0f
    • Dave Chinner's avatar
      xfs: factor out forced iclog flushes · 45eddb41
      Dave Chinner authored
      We force iclogs in several places - we need them all to have the
      same cache flush semantics, so start by factoring out the iclog
      force into a common helper.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      45eddb41
    • Dave Chinner's avatar
      xfs: fix ordering violation between cache flushes and tail updates · 0dc8f7f1
      Dave Chinner authored
      There is a race between the new CIL async data device metadata IO
      completion cache flush and the log tail in the iclog the flush
      covers being updated. This can be seen by repeating generic/482 in a
      loop and eventually log recovery fails with a failures such as this:
      
      XFS (dm-3): Starting recovery (logdev: internal)
      XFS (dm-3): bad inode magic/vsn daddr 228352 #0 (magic=0)
      XFS (dm-3): Metadata corruption detected at xfs_inode_buf_verify+0x180/0x190, xfs_inode block 0x37c00 xfs_inode_buf_verify
      XFS (dm-3): Unmount and run xfs_repair
      XFS (dm-3): First 128 bytes of corrupted metadata buffer:
      00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      XFS (dm-3): metadata I/O error in "xlog_recover_items_pass2+0x55/0xc0" at daddr 0x37c00 len 32 error 117
      
      Analysis of the logwrite replay shows that there were no writes to
      the data device between the FUA @ write 124 and the FUA at write @
      125, but log recovery @ 125 failed. The difference was the one log
      write @ 125 moved the tail of the log forwards from (1,8) to (1,32)
      and so the inode create intent in (1,8) was not replayed and so the
      inode cluster was zero on disk when replay of the first inode item
      in (1,32) was attempted.
      
      What this meant was that the journal write that occurred at @ 125
      did not ensure that metadata completed before the iclog was written
      was correctly on stable storage. The tail of the log moved forward,
      so IO must have been completed between the two iclog writes. This
      means that there is a race condition between the unconditional async
      cache flush in the CIL push work and the tail LSN that is written to
      the iclog. This happens like so:
      
      CIL push work				AIL push work
      -------------				-------------
      Add to committing list
      start async data dev cache flush
      .....
      <flush completes>
      <all writes to old tail lsn are stable>
      xlog_write
        ....					push inode create buffer
      					<start IO>
      					.....
      xlog_write(commit record)
        ....					<IO completes>
        					log tail moves
        					  xlog_assign_tail_lsn()
      start_lsn == commit_lsn
        <no iclog preflush!>
      xlog_state_release_iclog
        __xlog_state_release_iclog()
          <writes *new* tail_lsn into iclog>
        xlog_sync()
          ....
          submit_bio()
      <tail in log moves forward without flushing written metadata>
      
      Essentially, this can only occur if the commit iclog is issued
      without a cache flush. If the iclog bio is submitted with
      REQ_PREFLUSH, then it will guarantee that all the completed IO is
      one stable storage before the iclog bio with the new tail LSN in it
      is written to the log.
      
      IOWs, the tail lsn that is written to the iclog needs to be sampled
      *before* we issue the cache flush that guarantees all IO up to that
      LSN has been completed.
      
      To fix this without giving up the performance advantage of the
      flush/FUA optimisations (e.g. g/482 runtime halves with 5.14-rc1
      compared to 5.13), we need to ensure that we always issue a cache
      flush if the tail LSN changes between the initial async flush and
      the commit record being written. THis requires sampling the tail_lsn
      before we start the flush, and then passing the sampled tail LSN to
      xlog_state_release_iclog() so it can determine if the the tail LSN
      has changed while writing the checkpoint. If the tail LSN has
      changed, then it needs to set the NEED_FLUSH flag on the iclog and
      we'll issue another cache flush before writing the iclog.
      
      Fixes: eef983ff ("xfs: journal IO cache flush reductions")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      0dc8f7f1
    • Dave Chinner's avatar
      xfs: fold __xlog_state_release_iclog into xlog_state_release_iclog · 9d392064
      Dave Chinner authored
      Fold __xlog_state_release_iclog into its only caller to prepare
      make an upcoming fix easier.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      [hch: split from a larger patch]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      9d392064
    • Dave Chinner's avatar
      xfs: external logs need to flush data device · b5d721ea
      Dave Chinner authored
      The recent journal flush/FUA changes replaced the flushing of the
      data device on every iclog write with an up-front async data device
      cache flush. Unfortunately, the assumption of which this was based
      on has been proven incorrect by the flush vs log tail update
      ordering issue. As the fix for that issue uses the
      XLOG_ICL_NEED_FLUSH flag to indicate that data device needs a cache
      flush, we now need to (once again) ensure that an iclog write to
      external logs that need a cache flush to be issued actually issue a
      cache flush to the data device as well as the log device.
      
      Fixes: eef983ff ("xfs: journal IO cache flush reductions")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      b5d721ea
    • Dave Chinner's avatar
      xfs: flush data dev on external log write · b1e27239
      Dave Chinner authored
      We incorrectly flush the log device instead of the data device when
      trying to ensure metadata is correctly on disk before writing the
      unmount record.
      
      Fixes: eef983ff ("xfs: journal IO cache flush reductions")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      b1e27239
  2. 15 Jul, 2021 7 commits
    • Darrick J. Wong's avatar
      xfs: detect misaligned rtinherit directory extent size hints · b102a46c
      Darrick J. Wong authored
      If we encounter a directory that has been configured to pass on an
      extent size hint to a new realtime file and the hint isn't an integer
      multiple of the rt extent size, we should flag the hint for
      administrative review because that is a misconfiguration (that other
      parts of the kernel will fix automatically).
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      b102a46c
    • Darrick J. Wong's avatar
      xfs: fix an integer overflow error in xfs_growfs_rt · 0925fecc
      Darrick J. Wong authored
      During a realtime grow operation, we run a single transaction for each
      rt bitmap block added to the filesystem.  This means that each step has
      to be careful to increase sb_rblocks appropriately.
      
      Fix the integer overflow error in this calculation that can happen when
      the extent size is very large.  Found by running growfs to add a rt
      volume to a filesystem formatted with a 1g rt extent size.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      0925fecc
    • Darrick J. Wong's avatar
      xfs: improve FSGROWFSRT precondition checking · 0e2af929
      Darrick J. Wong authored
      Improve the checking at the start of a realtime grow operation so that
      we avoid accidentally set a new extent size that is too large and avoid
      adding an rt volume to a filesystem with rmap or reflink because we
      don't support rt rmap or reflink yet.
      
      While we're at it, separate the checks so that we're only testing one
      aspect at a time.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      0e2af929
    • Darrick J. Wong's avatar
      xfs: don't expose misaligned extszinherit hints to userspace · 5aa5b278
      Darrick J. Wong authored
      Commit 603f000b changed xfs_ioctl_setattr_check_extsize to reject an
      attempt to set an EXTSZINHERIT extent size hint on a directory with
      RTINHERIT set if the hint isn't a multiple of the realtime extent size.
      However, I have recently discovered that it is possible to change the
      realtime extent size when adding a rt device to a filesystem, which
      means that the existence of directories with misaligned inherited hints
      is not an accident.
      
      As a result, it's possible that someone could have set a valid hint and
      added an rt volume with a different rt extent size, which invalidates
      the ondisk hints.  After such a sequence, FSGETXATTR will report a
      misaligned hint, which FSSETXATTR will trip over, causing confusion if
      the user was doing the usual GET/SET sequence to change some other
      attribute.  Change xfs_fill_fsxattr to omit the hint if it isn't aligned
      properly.
      
      Fixes: 603f000b ("xfs: validate extsz hints against rt extent size when rtinherit is set")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      5aa5b278
    • Darrick J. Wong's avatar
      xfs: correct the narrative around misaligned rtinherit/extszinherit dirs · 83193e5e
      Darrick J. Wong authored
      While auditing the realtime growfs code, I realized that the GROWFSRT
      ioctl (and by extension xfs_growfs) has always allowed sysadmins to
      change the realtime extent size when adding a realtime section to the
      filesystem.  Since we also have always allowed sysadmins to set
      RTINHERIT and EXTSZINHERIT on directories even if there is no realtime
      device, this invalidates the premise laid out in the comments added in
      commit 603f000b.
      
      In other words, this is not a case of inadequate metadata validation.
      This is a case of nearly forgotten (and apparently untested) but
      supported functionality.  Update the comments to reflect what we've
      learned, and remove the log message about correcting the misalignment.
      
      Fixes: 603f000b ("xfs: validate extsz hints against rt extent size when rtinherit is set")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      83193e5e
    • Darrick J. Wong's avatar
      xfs: reset child dir '..' entry when unlinking child · 5838d035
      Darrick J. Wong authored
      While running xfs/168, I noticed a second source of post-shrink
      corruption errors causing shutdowns.
      
      Let's say that directory B has a low inode number and is a child of
      directory A, which has a high number.  If B is empty but open, and
      unlinked from A, B's dotdot link continues to point to A.  If A is then
      unlinked and the filesystem shrunk so that A is no longer a valid inode,
      a subsequent AIL push of B will trip the inode verifiers because the
      dotdot entry points outside of the filesystem.
      
      To avoid this problem, reset B's dotdot entry to the root directory when
      unlinking directories, since the root directory cannot be removed.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      5838d035
    • Darrick J. Wong's avatar
      xfs: check for sparse inode clusters that cross new EOAG when shrinking · da062d16
      Darrick J. Wong authored
      While running xfs/168, I noticed occasional write verifier shutdowns
      involving inodes at the very end of the filesystem.  Existing inode
      btree validation code checks that all inode clusters are fully contained
      within the filesystem.
      
      However, due to inadequate checking in the fs shrink code, it's possible
      that there could be a sparse inode cluster at the end of the filesystem
      where the upper inodes of the cluster are marked as holes and the
      corresponding blocks are free.  In this case, the last blocks in the AG
      are listed in the bnobt.  This enables the shrink to proceed but results
      in a filesystem that trips the inode verifiers.  Fix this by disallowing
      the shrink.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      da062d16
  3. 11 Jul, 2021 11 commits
    • Linus Torvalds's avatar
      Linux 5.14-rc1 · e73f0f0e
      Linus Torvalds authored
      e73f0f0e
    • Hugh Dickins's avatar
      mm/rmap: try_to_migrate() skip zone_device !device_private · 6c855fce
      Hugh Dickins authored
      I know nothing about zone_device pages and !device_private pages; but if
      try_to_migrate_one() will do nothing for them, then it's better that
      try_to_migrate() filter them first, than trawl through all their vmas.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c855fce
    • Hugh Dickins's avatar
      mm/rmap: fix new bug: premature return from page_mlock_one() · 023e1a8d
      Hugh Dickins authored
      In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
      cleared by the time it got page table lock, page_vma_mapped_walk_done()
      must be called before returning, either explicitly, or by a final call
      to page_vma_mapped_walk() - otherwise the page table remains locked.
      
      Fixes: cd62734c ("mm/rmap: split try_to_munlock from try_to_unmap")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/
      Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      023e1a8d
    • Hugh Dickins's avatar
      mm/rmap: fix old bug: munlocking THP missed other mlocks · d9770fcc
      Hugh Dickins authored
      The kernel recovers in due course from missing Mlocked pages: but there
      was no point in calling page_mlock() (formerly known as
      try_to_munlock()) on a THP, because nothing got done even when it was
      found to be mapped in another VM_LOCKED vma.
      
      It's true that we need to be careful: Mlocked accounting of pte-mapped
      THPs is too difficult (so consistently avoided); but Mlocked accounting
      of only-pmd-mapped THPs is supposed to work, even when multiple mappings
      are mlocked and munlocked or munmapped.  Refine the tests.
      
      There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
      page_mlock_one() does not even have to worry about that complication.
      
      (I said the kernel recovers: but would page reclaim be likely to split
      THP before rediscovering that it's VM_LOCKED? I've not followed that up)
      
      Fixes: 9a73f61b ("thp, mlock: do not mlock PTE-mapped file huge pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9770fcc
    • Hugh Dickins's avatar
      mm/rmap: fix comments left over from recent changes · 64b586d1
      Hugh Dickins authored
      Parallel developments in mm/rmap.c have left behind some out-of-date
      comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
      in try_to_migrate() itself), and try_to_migrate() returns nothing at
      all.
      
      TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it
      in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so
      delete the "recently referenced" comment from try_to_unmap_one() (once
      upon a time the comment was near the removed codeblock, but they drifted
      apart).
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Link: https://lore.kernel.org/lkml/563ce5b2-7a44-5b4d-1dfd-59a0e65932a9@google.com/
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64b586d1
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 98f7fdce
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
       "Two fixes:
      
         - Fix a MIPS IRQ handling RCU bug
      
         - Remove a DocBook annotation for a parameter that doesn't exist
           anymore"
      
      * tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/mips: Fix RCU violation when using irqdomain lookup on interrupt entry
        genirq/irqdesc: Drop excess kernel-doc entry @lookup
      98f7fdce
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 877029d9
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Three fixes:
      
         - Fix load tracking bug/inconsistency
      
         - Fix a sporadic CFS bandwidth constraints enforcement bug
      
         - Fix a uclamp utilization tracking bug for newly woken tasks"
      
      * tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/uclamp: Ignore max aggregation if rq is idle
        sched/fair: Fix CFS bandwidth hrtimer expiry type
        sched/fair: Sync load_sum with load_avg after dequeue
      877029d9
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 936b664f
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "A fix and a hardware-enablement addition:
      
         - Robustify uncore_snbep's skx_iio_set_mapping()'s error cleanup
      
         - Add cstate event support for Intel ICELAKE_X and ICELAKE_D"
      
      * tag 'perf-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel/uncore: Clean up error handling path of iio mapping
        perf/x86/cstate: Add ICELAKE_X and ICELAKE_D support
      936b664f
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 301c8b1d
      Linus Torvalds authored
      Pull locking fixes from Ingo Molnar:
      
       - Fix a Sparc crash
      
       - Fix a number of objtool warnings
      
       - Fix /proc/lockdep output on certain configs
      
       - Restore a kprobes fail-safe
      
      * tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/atomic: sparc: Fix arch_cmpxchg64_local()
        kprobe/static_call: Restore missing static_call_text_reserved()
        static_call: Fix static_call_text_reserved() vs __init
        jump_label: Fix jump_label_text_reserved() vs __init
        locking/lockdep: Fix meaningless /proc/lockdep output of lock classes on !CONFIG_PROVE_LOCKING
      301c8b1d
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 8b9cc17a
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "This is a set of minor fixes and clean ups in the core and various
        drivers.
      
        The only core change in behaviour is the I/O retry for spinup notify,
        but that shouldn't impact anything other than the failing case"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
        scsi: virtio_scsi: Add validation for residual bytes from response
        scsi: ipr: System crashes when seeing type 20 error
        scsi: core: Retry I/O for Notify (Enable Spinup) Required error
        scsi: mpi3mr: Fix warnings reported by smatch
        scsi: qedf: Add check to synchronize abort and flush
        scsi: MAINTAINERS: Add mpi3mr driver maintainers
        scsi: libfc: Fix array index out of bound exception
        scsi: mvsas: Use DEVICE_ATTR_RO()/RW() macro
        scsi: megaraid_mbox: Use DEVICE_ATTR_ADMIN_RO() macro
        scsi: qedf: Use DEVICE_ATTR_RO() macro
        scsi: qedi: Use DEVICE_ATTR_RO() macro
        scsi: message: mptfc: Switch from pci_ to dma_ API
        scsi: be2iscsi: Fix some missing space in some messages
        scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
        scsi: ufs: Fix build warning without CONFIG_PM
        scsi: bnx2fc: Remove meaningless bnx2fc_abts_cleanup() return value assignment
        scsi: qla2xxx: Add heartbeat check
        scsi: virtio_scsi: Do not overwrite SCSI status
        scsi: libsas: Add LUN number check in .slave_alloc callback
        scsi: core: Inline scsi_mq_alloc_queue()
        ...
      8b9cc17a
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v5.14-2021-07-10' of... · b1412bd7
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v5.14-2021-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tool updates from Arnaldo Carvalho de Melo:
       "New features:
      
         - Enable use of BPF counters with 'perf stat --for-each-cgroup',
           using per-CPU 'cgroup-switch' events with an attached BPF program
           that does aggregation per-cgroup in the kernel instead of using
           per-cgroup perf events.
      
         - Add Topdown metrics L2 events as default events in 'perf stat' for
           systems having those events.
      
        Hardware tracing:
      
         - Add a config for max loops without consuming a packet in the Intel
           PT packet decoder, set via 'perf config intel-pt.max-loops=N'
      
        Hardware enablement:
      
         - Disable misleading NMI watchdog message in 'perf stat' on hybrid
           systems such as Intel Alder Lake.
      
         - Add a dummy event on hybrid systems to collect metadata records.
      
         - Add 24x7 nest metric events for the Power10 platform.
      
        Fixes:
      
         - Fix event parsing for PMUs starting with the same prefix.
      
         - Fix the 'perf trace' 'trace' alias installation dir.
      
         - Fix buffer size to report iregs in perf script python scripts,
           supporting the extended registers in PowerPC.
      
         - Fix overflow in elf_sec__is_text().
      
         - Fix 's' on source line when disasm is empty in the annotation TUI,
           accessible via 'perf annotate', 'perf report' and 'perf top'.
      
         - Plug leaks in scandir() returned dirent entries in 'perf test' when
           sorting the shell tests.
      
         - Fix --task and --stat with pipe input in 'perf report'.
      
         - Fix 'perf probe' use of debuginfo files by build id.
      
         - If a DSO has both dynsym and symtab ELF sections, read from both
           when loading the symbol table, fixing a problem processing Fedora
           32 glibc DSOs.
      
        Libraries:
      
         - Add grouping of events to libperf, from code in tools/perf,
           allowing libperf users to use that mode.
      
        Misc:
      
         - Filter plt stubs from the 'perf probe --functions' output.
      
         - Update UAPI header copies for asound, DRM, mman-common.h and the
           ones affected by the quotactl_fd syscall"
      
      * tag 'perf-tools-for-v5.14-2021-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (29 commits)
        perf test: Add free() calls for scandir() returned dirent entries
        libperf: Add tests for perf_evlist__set_leader()
        libperf: Remove BUG_ON() from library code in get_group_fd()
        libperf: Add group support to perf_evsel__open()
        perf tools: Fix pattern matching for same substring in different PMU type
        perf record: Add a dummy event on hybrid systems to collect metadata records
        perf stat: Add Topdown metrics L2 events as default events
        libperf: Adopt evlist__set_leader() from tools/perf as perf_evlist__set_leader()
        libperf: Move 'nr_groups' from tools/perf to evlist::nr_groups
        libperf: Move 'leader' from tools/perf to perf_evsel::leader
        libperf: Move 'idx' from tools/perf to perf_evsel::idx
        libperf: Change tests to single static and shared binaries
        perf intel-pt: Add a config for max loops without consuming a packet
        perf stat: Disable the NMI watchdog message on hybrid
        perf vendor events power10: Adds 24x7 nest metric events for power10 platform
        perf script python: Fix buffer size to report iregs in perf script
        perf trace: Fix the perf trace link location
        perf top: Fix overflow in elf_sec__is_text()
        perf annotate: Fix 's' on source line when disasm is empty
        perf probe: Do not show @plt function by default
        ...
      b1412bd7
  4. 10 Jul, 2021 10 commits
    • Linus Torvalds's avatar
      Merge tag 'rtc-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · de554096
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "Mostly documentation/comment changes and non urgent fixes.
      
         - add or fix SPDX identifiers
      
         - NXP pcf*: fix datasheet URLs
      
         - imxdi: add wakeup support
      
         - pcf2127: handle timestamp interrupts, this fixes a possible
           interrupt storm
      
         - bd70528: Drop BD70528 support"
      
      * tag 'rtc-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (33 commits)
        rtc: pcf8523: rename register and bit defines
        rtc: pcf2127: handle timestamp interrupts
        rtc: at91sam9: Remove unnecessary offset variable checks
        rtc: s5m: Check return value of s5m_check_peding_alarm_interrupt()
        rtc: spear: convert to SPDX identifier
        rtc: tps6586x: convert to SPDX identifier
        rtc: tps80031: convert to SPDX identifier
        rtc: rtd119x: Fix format of SPDX identifier
        rtc: sc27xx: Fix format of SPDX identifier
        rtc: palmas: convert to SPDX identifier
        rtc: max6900: convert to SPDX identifier
        rtc: ds1374: convert to SPDX identifier
        rtc: au1xxx: convert to SPDX identifier
        rtc: pcf85063: Update the PCF85063A datasheet revision
        dt-bindings: rtc: ti,bq32k: take maintainership
        rtc: pcf8563: Fix the datasheet URL
        rtc: pcf85063: Fix the datasheet URL
        rtc: pcf2127: Fix the datasheet URL
        dt-bindings: rtc: ti,bq32k: Convert to json-schema
        dt-bindings: rtc: rx8900: Convert to YAML schema
        ...
      de554096
    • Mel Gorman's avatar
      mm/page_alloc: Revert pahole zero-sized workaround · 6bce2443
      Mel Gorman authored
      Commit dbbee9d5 ("mm/page_alloc: convert per-cpu list protection to
      local_lock") folded in a workaround patch for pahole that was unable to
      deal with zero-sized percpu structures.
      
      A superior workaround is achieved with commit a0b8200d ("kbuild:
      skip per-CPU BTF generation for pahole v1.18-v1.21").
      
      This patch reverts the dummy field and the pahole version check.
      
      Fixes: dbbee9d5 ("mm/page_alloc: convert per-cpu list protection to local_lock")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6bce2443
    • Alexandre Belloni's avatar
      rtc: pcf8523: rename register and bit defines · 4aa90c03
      Alexandre Belloni authored
      arch/arm/mach-ixp4xx/include/mach/platform.h now gets included indirectly
      and defines REG_OFFSET. Rename the register and bit definition to something
      specific to the driver.
      
      Fixes: 7fd70c65 ("ARM: irqstat: Get rid of duplicated declaration")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Link: https://lore.kernel.org/r/20210710211431.1393589-1-alexandre.belloni@bootlin.com
      4aa90c03
    • Linus Torvalds's avatar
      Merge tag '5.14-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · 1e16624d
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "13 cifs/smb3 fixes. Most are to address minor issues pointed out by
        Coverity.
      
        Also includes a packet signing enhancement and mount improvement"
      
      * tag '5.14-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal version number
        cifs: prevent NULL deref in cifs_compose_mount_options()
        SMB3.1.1: Add support for negotiating signing algorithm
        cifs: use helpers when parsing uid/gid mount options and validate them
        CIFS: Clarify SMB1 code for POSIX Lock
        CIFS: Clarify SMB1 code for rename open file
        CIFS: Clarify SMB1 code for delete
        CIFS: Clarify SMB1 code for SetFileSize
        smb3: fix typo in header file
        CIFS: Clarify SMB1 code for UnixSetPathInfo
        CIFS: Clarify SMB1 code for UnixCreateSymLink
        cifs: clarify SMB1 code for UnixCreateHardLink
        cifs: make locking consistent around the server session status
      1e16624d
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 67d8d365
      Linus Torvalds authored
      Pull pci fix from Bjorn Helgaas:
       "Revert host bridge window patch that fixed HP EliteDesk 805 G6, but
        broke ppc:sam460ex (Bjorn Helgaas)"
      
      * tag 'pci-v5.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        Revert "PCI: Coalesce host bridge contiguous apertures"
      67d8d365
    • Linus Torvalds's avatar
      Merge tag 'i3c/for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · 88bbd8a0
      Linus Torvalds authored
      Pull i3c updates from Alexandre Belloni:
      
       - two small fixes to the svc driver
      
      * tag 'i3c/for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
        i3c: master: svc: fix doc warning in svc-i3c-master.c
        i3c: master: svc: drop free_irq of devm_request_irq allocated irq
      88bbd8a0
    • Linus Torvalds's avatar
      Merge tag 'thermal-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux · f7ea4be4
      Linus Torvalds authored
      Pull thermal updates from Daniel Lezcano:
      
       - Add rk3568 sensor support (Finley Xiao)
      
       - Add missing MODULE_DEVICE_TABLE for the Spreadtrum sensor (Chunyan
         Zhang)
      
       - Export additionnal attributes for the int340x thermal processor
         (Srinivas Pandruvada)
      
       - Add SC7280 compatible for the tsens driver (Rajeshwari Ravindra
         Kamble)
      
       - Fix kernel documentation for thermal_zone_device_unregister() and use
         devm_platform_get_and_ioremap_resource() (Yang Yingliang)
      
       - Fix coefficient calculations for the rcar_gen3 sensor driver (Niklas
         Söderlund)
      
       - Fix shadowing variable rcar_gen3_ths_tj_1 (Geert Uytterhoeven)
      
       - Add missing of_node_put() for the iMX and Spreadtrum sensors
         (Krzysztof Kozlowski)
      
       - Add tegra3 thermal sensor DT bindings (Dmitry Osipenko)
      
       - Stop the thermal zone monitoring when unregistering it to prevent a
         temperature update without the 'get_temp' callback (Dmitry Osipenko)
      
       - Add rk3568 DT bindings, convert bindings to yaml schemas and add the
         corresponding compatible in the Rockchip sensor (Ezequiel Garcia)
      
       - Add the sc8180x compatible for the Qualcomm tsensor (Bjorn Andersson)
      
       - Use the find_first_zero_bit() function instead of custom code (Andy
         Shevchenko)
      
       - Fix the kernel doc for the device cooling device (Yang Li)
      
       - Reorg the processor thermal int340x to set the scene for the PCI mmio
         driver (Srinivas Pandruvada)
      
       - Add PCI MMIO driver for the int340x processor thermal driver
         (Srinivas Pandruvada)
      
       - Add hwmon sensors for the mediatek sensor (Frank Wunderlich)
      
       - Fix warning for return value reported by Smatch for the int340x
         thermal processor (Srinivas Pandruvada)
      
       - Fix wrong register access and decoding for the int340x thermal
         processor (Srinivas Pandruvada)
      
      * tag 'thermal-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (23 commits)
        thermal/drivers/int340x/processor_thermal: Fix tcc setting
        thermal/drivers/int340x/processor_thermal: Fix warning for return value
        thermal/drivers/mediatek: Add sensors-support
        thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver
        thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
        thermal: devfreq_cooling: Fix kernel-doc
        thermal/drivers/intel/intel_soc_dts_iosf: Switch to use find_first_zero_bit()
        dt-bindings: thermal: tsens: Add sc8180x compatible
        dt-bindings: rockchip-thermal: Support the RK3568 SoC compatible
        dt-bindings: thermal: convert rockchip-thermal to json-schema
        thermal/core/thermal_of: Stop zone device before unregistering it
        dt-bindings: thermal: Add binding for Tegra30 thermal sensor
        thermal/drivers/sprd: Add missing of_node_put for loop iteration
        thermal/drivers/imx_sc: Add missing of_node_put for loop iteration
        thermal/drivers/rcar_gen3_thermal: Do not shadow rcar_gen3_ths_tj_1
        thermal/drivers/rcar_gen3_thermal: Fix coefficient calculations
        thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()
        thermal/core: Correct function name thermal_zone_device_unregister()
        dt-bindings: thermal: tsens: Add compatible string to TSENS binding for SC7280
        thermal/drivers/int340x: processor_thermal: Export additional attributes
        ...
      f7ea4be4
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 81361b83
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - Increase the -falign-functions alignment for the debug option.
      
       - Remove ugly libelf checks from the top Makefile.
      
       - Make the silent build (-s) more silent.
      
       - Re-compile the kernel if KBUILD_BUILD_TIMESTAMP is specified.
      
       - Various script cleanups
      
      * tag 'kbuild-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (27 commits)
        scripts: add generic syscallnr.sh
        scripts: check duplicated syscall number in syscall table
        sparc: syscalls: use pattern rules to generate syscall headers
        parisc: syscalls: use pattern rules to generate syscall headers
        nds32: add arch/nds32/boot/.gitignore
        kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set
        kbuild: modpost: Explicitly warn about unprototyped symbols
        kbuild: remove trailing slashes from $(KBUILD_EXTMOD)
        kconfig.h: explain IS_MODULE(), IS_ENABLED()
        kconfig: constify long_opts
        scripts/setlocalversion: simplify the short version part
        scripts/setlocalversion: factor out 12-chars hash construction
        scripts/setlocalversion: add more comments to -dirty flag detection
        scripts/setlocalversion: remove workaround for old make-kpkg
        scripts/setlocalversion: remove mercurial, svn and git-svn supports
        kbuild: clean up ${quiet} checks in shell scripts
        kbuild: sink stdout from cmd for silent build
        init: use $(call cmd,) for generating include/generated/compile.h
        kbuild: merge scripts/mkmakefile to top Makefile
        sh: move core-y in arch/sh/Makefile to arch/sh/Kbuild
        ...
      81361b83
    • Linus Torvalds's avatar
      Merge tag 's390-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · e98e03d0
      Linus Torvalds authored
      Pull more s390 updates from Vasily Gorbik:
      
       - Fix preempt_count initialization.
      
       - Rework call_on_stack() macro to add proper type handling and avoid
         possible register corruption.
      
       - More error prone "register asm" removal and fixes.
      
       - Fix syscall restarting when multiple signals are coming in. This adds
         minimalistic trampolines to vdso so we can return from signal without
         using the stack which requires pgm check handler hacks when NX is
         enabled.
      
       - Remove HAVE_IRQ_EXIT_ON_IRQ_STACK since this is no longer true after
         switch to generic entry.
      
       - Fix protected virtualization secure storage access exception
         handling.
      
       - Make machine check C handler always enter with DAT enabled and move
         register validation to C code.
      
       - Fix tinyconfig boot problem by avoiding MONITOR CALL without
         CONFIG_BUG.
      
       - Increase asm symbols alignment to 16 to make it consistent with
         compilers.
      
       - Enable concurrent access to the CPU Measurement Counter Facility.
      
       - Add support for dynamic AP bus size limit and rework ap_dqap to deal
         with messages greater than recv buffer.
      
      * tag 's390-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
        s390: preempt: Fix preempt_count initialization
        s390/linkage: increase asm symbols alignment to 16
        s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn()
        s390: add type checking to CALL_ON_STACK_NORETURN() macro
        s390: remove old CALL_ON_STACK() macro
        s390/softirq: use call_on_stack() macro
        s390/lib: use call_on_stack() macro
        s390/smp: use call_on_stack() macro
        s390/kexec: use call_on_stack() macro
        s390/irq: use call_on_stack() macro
        s390/mm: use call_on_stack() macro
        s390: introduce proper type handling call_on_stack() macro
        s390/irq: simplify on_async_stack()
        s390/irq: inline do_softirq_own_stack()
        s390/irq: simplify do_softirq_own_stack()
        s390/ap: get rid of register asm in ap_dqap()
        s390: rename PIF_SYSCALL_RESTART to PIF_EXECVE_PGSTE_RESTART
        s390: move restart of execve() syscall
        s390/signal: remove sigreturn on stack
        s390/signal: switch to using vdso for sigreturn and syscall restart
        ...
      e98e03d0
    • Linus Torvalds's avatar
      Merge tag 'mips_5.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 379cf80a
      Linus Torvalds authored
      Pull MIPS fixes from Thomas Bogendoerfer:
      
       - fix for accesing gic via vdso
      
       - two build fixes
      
      * tag 'mips_5.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: vdso: Invalid GIC access through VDSO
        mips: disable branch profiling in boot/decompress.o
        mips: always link byteswap helpers into decompressor
      379cf80a