1. 10 Aug, 2023 4 commits
    • Mike Christie's avatar
      vhost-scsi: Rename vhost_scsi_iov_to_sgl · c5ace19e
      Mike Christie authored
      Rename vhost_scsi_iov_to_sgl to vhost_scsi_map_iov_to_sgl so it matches
      matches the naming style used for vhost_scsi_copy_iov_to_sgl.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230709202859.138387-3-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      c5ace19e
    • Mike Christie's avatar
      vhost-scsi: Fix alignment handling with windows · 5ced58bf
      Mike Christie authored
      The linux block layer requires bios/requests to have lengths with a 512
      byte alignment. Some drivers/layers like dm-crypt and the directi IO code
      will test for it and just fail. Other drivers like SCSI just assume the
      requirement is met and will end up in infinte retry loops. The problem
      for drivers like SCSI is that it uses functions like blk_rq_cur_sectors
      and blk_rq_sectors which divide the request's length by 512. If there's
      lefovers then it just gets dropped. But other code in the block/scsi
      layer may use blk_rq_bytes/blk_rq_cur_bytes and end up thinking there is
      still data left and try to retry the cmd. We can then end up getting
      stuck in retry loops where part of the block/scsi thinks there is data
      left, but other parts think we want to do IOs of zero length.
      
      Linux will always check for alignment, but windows will not. When
      vhost-scsi then translates the iovec it gets from a windows guest to a
      scatterlist, we can end up with sg items where the sg->length is not
      divisible by 512 due to the misaligned offset:
      
      sg[0].offset = 255;
      sg[0].length = 3841;
      sg...
      sg[N].offset = 0;
      sg[N].length = 255;
      
      When the lio backends then convert the SG to bios or other iovecs, we
      end up sending them with the same misaligned values and can hit the
      issues above.
      
      This just has us drop down to allocating a temp page and copying the data
      when we detect a misaligned buffer and the IO is large enough that it
      will get split into multiple bad IOs.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230709202859.138387-2-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      5ced58bf
    • Shannon Nelson's avatar
      pds_vdpa: protect Makefile from unconfigured debugfs · 9ad1a29c
      Shannon Nelson authored
      debugfs.h protects itself from an undefined DEBUG_FS, so it is
      not necessary to check it in the driver code or the Makefile.
      The driver code had been updated for this, but the Makefile had
      missed the update.
      
      Link: https://lore.kernel.org/linux-next/fec68c3c-8249-7af4-5390-0495386a76f9@infradead.org/
      Fixes: a16291b5 ("pds_vdpa: Add new vDPA driver for AMD/Pensando DSC")
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Message-Id: <20230706231718.54198-1-shannon.nelson@amd.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      9ad1a29c
    • Wolfram Sang's avatar
      virtio-mmio: don't break lifecycle of vm_dev · 55c91fed
      Wolfram Sang authored
      vm_dev has a separate lifecycle because it has a 'struct device'
      embedded. Thus, having a release callback for it is correct.
      
      Allocating the vm_dev struct with devres totally breaks this protection,
      though. Instead of waiting for the vm_dev release callback, the memory
      is freed when the platform_device is removed. Resulting in a
      use-after-free when finally the callback is to be called.
      
      To easily see the problem, compile the kernel with
      CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs.
      
      The fix is easy, don't use devres in this case.
      
      Found during my research about object lifetime problems.
      
      Fixes: 7eb781b1 ("virtio_mmio: add cleanup for virtio_mmio_probe")
      Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      55c91fed
  2. 06 Aug, 2023 8 commits
    • Linus Torvalds's avatar
      Linux 6.5-rc5 · 52a93d39
      Linus Torvalds authored
      52a93d39
    • Linus Torvalds's avatar
      Merge tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 0108963f
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup
      
       - Clean up directory iterators and clarify file_needs_f_pos_lock()
      
      * tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: rely on ->iterate_shared to determine f_pos locking
        vfs: get rid of old '->iterate' directory operation
        proc: fix missing conversion to 'iterate_shared'
        open: make RESOLVE_CACHED correctly test for O_TMPFILE
      0108963f
    • Christian Brauner's avatar
      fs: rely on ->iterate_shared to determine f_pos locking · 7d84d1b9
      Christian Brauner authored
      Now that we removed ->iterate we don't need to check for either
      ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
      for ->iterate_shared instead. This will tell us whether we need to
      unconditionally take the lock. Not just does it allow us to avoid
      checking f_inode's mode it also actually clearly shows that we're
      locking because of readdir.
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      7d84d1b9
    • Linus Torvalds's avatar
      vfs: get rid of old '->iterate' directory operation · 3e327154
      Linus Torvalds authored
      All users now just use '->iterate_shared()', which only takes the
      directory inode lock for reading.
      
      Filesystems that never got convered to shared mode now instead use a
      wrapper that drops the lock, re-takes it in write mode, calls the old
      function, and then downgrades the lock back to read mode.
      
      This way the VFS layer and other callers no longer need to care about
      filesystems that never got converted to the modern era.
      
      The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
      ntfs, ocfs2, overlayfs, and vboxsf.
      
      Honestly, several of them look like they really could just iterate their
      directories in shared mode and skip the wrapper entirely, but the point
      of this change is to not change semantics or fix filesystems that
      haven't been fixed in the last 7+ years, but to finally get rid of the
      dual iterators.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      3e327154
    • Linus Torvalds's avatar
      proc: fix missing conversion to 'iterate_shared' · 0a2c2baa
      Linus Torvalds authored
      I'm looking at the directory handling due to the discussion about f_pos
      locking (see commit 79796425: "file: reinstate f_pos locking
      optimization for regular files"), and wanting to clean that up.
      
      And one source of ugliness is how we were supposed to move filesystems
      over to the '->iterate_shared()' function that only takes the inode lock
      for reading many many years ago, but several filesystems still use the
      bad old '->iterate()' that takes the inode lock for exclusive access.
      
      See commit 61922694 ("introduce a parallel variant of ->iterate()")
      that also added some documentation stating
      
            Old method is only used if the new one is absent; eventually it will
            be removed.  Switch while you still can; the old one won't stay.
      
      and that was back in April 2016.  Here we are, many years later, and the
      old version is still clearly sadly alive and well.
      
      Now, some of those old style iterators are probably just because the
      filesystem may end up having per-inode mutable data that it uses for
      iterating a directory, but at least one case is just a mistake.
      
      Al switched over most filesystems to use '->iterate_shared()' back when
      it was introduced.  In particular, the /proc filesystem was converted as
      one of the first ones in commit f50752ea ("switch all procfs
      directories ->iterate_shared()").
      
      But then later one new user of '->iterate()' was then re-introduced by
      commit 6d9c939d ("procfs: add smack subdir to attrs").
      
      And that's clearly not what we wanted, since that new case just uses the
      same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
      that other /proc pident directories use, and they are most definitely
      safe to use with the inode lock held shared.
      
      So just fix it.
      
      This still leaves a fair number of oddball filesystems using the
      old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
      overlayfs, and vboxsf), but at least we don't have any remaining in the
      core filesystems.
      
      I'm going to add a wrapper function that just drops the read-lock and
      takes it as a write lock, so that we can clean up the core vfs layer and
      make all the ugly 'this filesystem needs exclusive inode locking' be
      just filesystem-internal warts.
      
      I just didn't want to make that conversion when we still had a core user
      left.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      0a2c2baa
    • Aleksa Sarai's avatar
      open: make RESOLVE_CACHED correctly test for O_TMPFILE · a0fc452a
      Aleksa Sarai authored
      O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
      fast-path check for RESOLVE_CACHED would reject all users passing
      O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
      for __O_TMPFILE.
      
      Cc: stable@vger.kernel.org # v5.12+
      Fixes: 99668f61 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
      Signed-off-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      a0fc452a
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux · f0ab9f34
      Linus Torvalds authored
      Pull rust fixes from Miguel Ojeda:
      
       - Allocator: prevent mis-aligned allocation
      
       - Types: delete 'ForeignOwnable::borrow_mut'. A sound replacement is
         planned for the merge window
      
       - Build: fix bindgen error with UBSAN_BOUNDS_STRICT
      
      * tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux:
        rust: fix bindgen build error with UBSAN_BOUNDS_STRICT
        rust: delete `ForeignOwnable::borrow_mut`
        rust: allocator: Prevent mis-aligned allocation
      f0ab9f34
    • Linus Torvalds's avatar
      Merge tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · fb0d9199
      Linus Torvalds authored
      Pull ata fix from Damien Le Moal:
      
       - Prevent the scsi disk driver from issuing a START STOP UNIT command
         for ATA devices during system resume as this causes various issues
         reported by multiple users.
      
      * tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata,scsi: do not issue START STOP UNIT on resume
      fb0d9199
  3. 05 Aug, 2023 5 commits
  4. 04 Aug, 2023 13 commits
  5. 03 Aug, 2023 10 commits