Commits · dcd1a59c62dc49da75539213611156d6db50ab5d · Kirill Smelkov / linux

20 Oct, 2022 7 commits

blktrace: fix possible memleak in '__blk_trace_remove' · dcd1a59c

Ye Bin authored Oct 19, 2022

When test as follows:
step1: ioctl(sda, BLKTRACESETUP, &arg)
step2: ioctl(sda, BLKTRACESTART, NULL)
step3: ioctl(sda, BLKTRACETEARDOWN, NULL)
step4: ioctl(sda, BLKTRACESETUP, &arg)
Got issue as follows:
debugfs: File 'dropped' in directory 'sda' already present!
debugfs: File 'msg' in directory 'sda' already present!
debugfs: File 'trace0' in directory 'sda' already present!

And also find syzkaller report issue like "KASAN: use-after-free Read in relay_switch_subbuf"
"https://syzkaller.appspot.com/bug?id=13849f0d9b1b818b087341691be6cc3ac6a6bfb7"

If remove block trace without stop(BLKTRACESTOP) block trace, '__blk_trace_remove'
will just set 'q->blk_trace' with NULL. However, debugfs file isn't removed, so
will report file already present when call BLKTRACESETUP.
static int __blk_trace_remove(struct request_queue *q)
{
        struct blk_trace *bt;

        bt = rcu_replace_pointer(q->blk_trace, NULL,
                                 lockdep_is_held(&q->debugfs_mutex));
        if (!bt)
                return -EINVAL;

	if (bt->trace_state != Blktrace_running)
        	blk_trace_cleanup(q, bt);

        return 0;
}

If do test as follows:
step1: ioctl(sda, BLKTRACESETUP, &arg)
step2: ioctl(sda, BLKTRACESTART, NULL)
step3: ioctl(sda, BLKTRACETEARDOWN, NULL)
step4: remove sda

There will remove debugfs directory which will remove recursively all file
under directory.
>> blk_release_queue
>>	debugfs_remove_recursive(q->debugfs_dir)
So all files which created in 'do_blk_trace_setup' are removed, and
'dentry->d_inode' is NULL. But 'q->blk_trace' is still in 'running_trace_lock',
'trace_note_tsk' will traverse 'running_trace_lock' all nodes.
>>trace_note_tsk
>>  trace_note
>>    relay_reserve
>>       relay_switch_subbuf
>>        d_inode(buf->dentry)->i_size

To solve above issues, reference commit '5afedf67', call 'blk_trace_cleanup'
unconditionally in '__blk_trace_remove' and first stop block trace in
'blk_trace_cleanup'.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019033602.752383-3-yebin@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

dcd1a59c

blktrace: introduce 'blk_trace_{start,stop}' helper · 60a9bb90

Ye Bin authored Oct 19, 2022

Introduce 'blk_trace_{start,stop}' helper. No functional changed.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019033602.752383-2-yebin@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

60a9bb90

bio: safeguard REQ_ALLOC_CACHE bio put · d4347d50

Pavel Begunkov authored Oct 18, 2022

bio_put() with REQ_ALLOC_CACHE assumes that it's executed not from
an irq context. Let's add a warning if the invariant is not respected,
especially since there is a couple of places removing REQ_POLLED by hand
without also clearing REQ_ALLOC_CACHE.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/558d78313476c4e9c233902efa0092644c3d420a.1666122465.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d4347d50

block, bfq: remove unused variable for bfq_queue · 33566f92

Yuwei Guan authored Oct 18, 2022

it defined in d0edc247, but there's nowhere to use it,
so remove it.
Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20221018030139.159-1-Yuwei.Guan@zeekrlife.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

33566f92

drbd: only clone bio if we have a backing device · 6d42ddf7

Christoph Böhmwalder authored Oct 20, 2022

Commit c347a787 (drbd: set ->bi_bdev in drbd_req_new) moved a
bio_set_dev call (which has since been removed) to "earlier", from
drbd_request_prepare to drbd_req_new.

The problem is that this accesses device->ldev->backing_bdev, which is
not NULL-checked at this point. When we don't have an ldev (i.e. when
the DRBD device is diskless), this leads to a null pointer deref.

So, only allocate the private_bio if we actually have a disk. This is
also a small optimization, since we don't clone the bio to only to
immediately free it again in the diskless case.

Fixes: c347a787 ("drbd: set ->bi_bdev in drbd_req_new")
Co-developed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Co-developed-by: Joel Colledge <joel.colledge@linbit.com>
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020085205.129090-1-christoph.boehmwalder@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6d42ddf7

Merge tag 'nvme-6.1-2022-10-22' of git://git.infradead.org/nvme into block-6.1 · 70ee4a4c

Jens Axboe authored Oct 20, 2022

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.1

 - fix nvme-hwmon for DMA non-cohehrent architectures (Serge Semin)
 - add a nvme-hwmong maintainer (Christoph Hellwig)
 - fix error pointer dereference in error handling (Dan Carpenter)
 - fix invalid memory reference in nvmet_subsys_attr_qid_max_show
   (Daniel Wagner)
 - don't limit the DMA segment size in nvme-apple (Russell King)
 - fix workqueue MEM_RECLAIM flushing dependency (Sagi Grimberg)
 - disable write zeroes on various Kingston SSDs (Xander Li)"

* tag 'nvme-6.1-2022-10-22' of git://git.infradead.org/nvme:
  nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
  nvmet: fix workqueue MEM_RECLAIM flushing dependency
  nvme-hwmon: kmalloc the NVME SMART log buffer
  nvme-hwmon: consistently ignore errors from nvme_hwmon_init
  nvme: add Guenther as nvme-hwmon maintainer
  nvme-apple: don't limit DMA segement size
  nvme-pci: disable write zeroes on various Kingston SSD
  nvme: fix error pointer dereference in error handling

70ee4a4c

ublk_drv: use flexible-array member instead of zero-length array · 72495b5a

Yushan Zhou authored Oct 18, 2022

Eliminate the following coccicheck warning:
./drivers/block/ublk_drv.c:127:16-19: WARNING use flexible-array member instead
Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Link: https://lore.kernel.org/r/20221018100132.355393-1-zys.zljxml@gmail.comReviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

72495b5a

19 Oct, 2022 8 commits

nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show · 94f5a068

Daniel Wagner authored Oct 07, 2022

The item passed into nvmet_subsys_attr_qid_max_show is not a member of
struct nvmet_port, it is part of nvmet_subsys.  Hence, don't try to
dereference it as struct nvme_ctrl pointer.

Fixes: 3e980f59 ("nvmet: Expose max queues to configfs")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20220913064203.133536-1-dwagner@suse.deSigned-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

94f5a068

nvmet: fix workqueue MEM_RECLAIM flushing dependency · ddd2b8de

Sagi Grimberg authored Sep 28, 2022

The keep alive timer needs to stay on nvmet_wq, and not
modified to reschedule on the system_wq.

This fixes a warning:
------------[ cut here ]------------
workqueue: WQ_MEM_RECLAIM
nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing
!WQ_MEM_RECLAIM events:nvmet_keep_alive_timer [nvmet]
WARNING: CPU: 3 PID: 1086 at kernel/workqueue.c:2628
check_flush_dependency+0x16c/0x1e0
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 8832cf92 ("nvmet: use a private workqueue instead of the system workqueue")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

ddd2b8de

nvme-hwmon: kmalloc the NVME SMART log buffer · c94b7f9b

Serge Semin authored Oct 18, 2022

Recent commit 52fde2c0 ("nvme: set dma alignment to dword") has
caused a regression on our platform.

It turned out that the nvme_get_log() method invocation caused the
nvme_hwmon_data structure instance corruption. In particular the
nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
garbage. After some research we discovered that the problem happened
even before the actual NVME DMA execution, but during the buffer mapping.
Since our platform is DMA-noncoherent, the mapping implied the cache-line
invalidations or write-backs depending on the DMA-direction parameter.
In case of the NVME SMART log getting the DMA was performed
from-device-to-memory, thus the cache-invalidation was activated during
the buffer mapping. Since the log-buffer isn't cache-line aligned, the
cache-invalidation caused the neighbour data to be discarded. The
neighbouring data turned to be the data surrounding the buffer in the
framework of the nvme_hwmon_data structure.

In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.

Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.

[1] Documentation/core-api/dma-api-howto.rst

Fixes: 400b6a7b ("nvme: Add hardware monitoring support")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c94b7f9b

nvme-hwmon: consistently ignore errors from nvme_hwmon_init · 6b8cf940

Christoph Hellwig authored Oct 18, 2022

An NVMe controller works perfectly fine even when the hwmon
initialization fails.  Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>

6b8cf940

nvme: add Guenther as nvme-hwmon maintainer · 6ff5ba97

Christoph Hellwig authored Oct 18, 2022

Given that non of the overall NVMe maintainers knows this code very
deeply it probably makes sense to add Guenther as an additional
MAINTAINER for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Guenter Roeck <linux@roeck-us.net>

6ff5ba97

nvme-apple: don't limit DMA segement size · d622f847

Russell King (Oracle) authored Oct 12, 2022

NVMe uses PRPs for data transfers and has no specific limit for a single
DMA segement.  Limiting the size will cause problems because the block
layer assumes PRP-ish devices using a virt boundary mask don't have a
segment limit.  And while this is true, we also really need to tell the
DMA mapping layer about it, otherwise dma-debug will trip over it.

Fixes: 5bd2927a ("nvme-apple: Add initial Apple SoC NVMe driver")
Suggested-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[hch: rewrote the commit message based on the PCIe commit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Curtin <ecurtin@redhat.com>
Reviewed-by: Sven Peter <sven@svenpeter.dev>

d622f847

nvme-pci: disable write zeroes on various Kingston SSD · ac9b57d4

Xander Li authored Oct 11, 2022

Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to
process.  The firmware version is locked by these SSDs, we can not expect
firmware improvement, so disable Write_Zeroes cmd.
Signed-off-by: Xander Li <xander_li@kingston.com.tw>
Signed-off-by: Christoph Hellwig <hch@lst.de>

ac9b57d4

nvme: fix error pointer dereference in error handling · 4739824e

Dan Carpenter authored Oct 15, 2022

There is typo here so it releases the wrong variable.  "ctrl->admin_q"
was intended instead of "ctrl->fabrics_q".

Fixes: fe60e8c5 ("nvme: add common helpers to allocate and free tagsets")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

4739824e

18 Oct, 2022 1 commit

Documentation: document ublk user recovery feature · e0539ae0

ZiyangZhang authored Oct 18, 2022

Add documentation for user recovery feature of ublk subsystem.
Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20221018045346.99706-2-ZiyangZhang@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e0539ae0

16 Oct, 2022 1 commit

blk-mq: fix null pointer dereference in blk_mq_clear_rq_mapping() · 76dd2980

Yu Kuai authored Oct 11, 2022

Our syzkaller report a null pointer dereference, root cause is
following:

__blk_mq_alloc_map_and_rqs
 set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs
  blk_mq_alloc_map_and_rqs
   blk_mq_alloc_rqs
    // failed due to oom
    alloc_pages_node
    // set->tags[hctx_idx] is still NULL
    blk_mq_free_rqs
     drv_tags = set->tags[hctx_idx];
     // null pointer dereference is triggered
     blk_mq_clear_rq_mapping(drv_tags, ...)

This is because commit 63064be1 ("blk-mq:
Add blk_mq_alloc_map_and_rqs()") merged the two steps:

1) set->tags[hctx_idx] = blk_mq_alloc_rq_map()
2) blk_mq_alloc_rqs(..., set->tags[hctx_idx])

into one step:

set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs()

Since tags is not initialized yet in this case, fix the problem by
checking if tags is NULL pointer in blk_mq_clear_rq_mapping().

Fixes: 63064be1 ("blk-mq: Add blk_mq_alloc_map_and_rqs()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20221011142253.4015966-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

76dd2980

12 Oct, 2022 6 commits

Merge tag 'nvme-6.1-2022-10-12' of git://git.infradead.org/nvme into block-6.1 · 3bc429c1

Jens Axboe authored Oct 12, 2022

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 6.1

 - add NVME_QUIRK_BOGUS_NID for Lexar NM760 (Abhijit)
 - avoid the deepest sleep state on ZHITAI TiPro5000 SSDs (Xi Ruoyao)
 - fix possible hang caused during ctrl deletion (Sagi Grimberg)
 - fix possible hang in live ns resize with ANA access (Sagi Grimberg)"

* tag 'nvme-6.1-2022-10-12' of git://git.infradead.org/nvme:
  nvme-multipath: fix possible hang in live ns resize with ANA access
  nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760
  nvme-tcp: fix possible hang caused during ctrl deletion
  nvme-rdma: fix possible hang caused during ctrl deletion

3bc429c1

nvme-multipath: fix possible hang in live ns resize with ANA access · 72e3b888

Sagi Grimberg authored Sep 29, 2022

When we revalidate paths as part of ns size change (as of commit
e7d65803), it is possible that during the path revalidation, the
only paths that is IO capable (i.e. optimized/non-optimized) are the
ones that ns resize was not yet informed to the host, which will cause
inflight requests to be requeued (as we have available paths but none
are IO capable). These requests on the requeue list are waiting for
someone to resubmit them at some point.

The IO capable paths will eventually notify the ns resize change to the
host, but there is nothing that will kick the requeue list to resubmit
the queued requests.

Fix this by always kicking the requeue list, and if no IO capable path
exists, these requests will be queued again.

A typical log that indicates that IOs are requeued:
--
nvme nvme1: creating 4 I/O queues.
nvme nvme1: new ctrl: "testnqn1"
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
nvme nvme1: rescanning namespaces.
nvme1n1: detected capacity change from 2097152 to 4194304
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
nvme nvme2: rescanning namespaces.
--
Reported-by: Yogev Cohen <yogev@lightbitslabs.com>
Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Christoph Hellwig <hch@lst.de>

72e3b888

nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs · d5d3c100

Xi Ruoyao authored Sep 28, 2022

ZHITAI TiPro5000 SSDs has the same APST sleep problem as its cousin,
TiPro7000.  The quirk for TiPro7000 has been added in
commit 6b961bce ("nvme-pci: avoid the deepest sleep state on
ZHITAI TiPro7000 SSDs"), use the same quirk for TiPro5000.

The ASPT data from "nvme id-ctrl /dev/nvme1":

vid       : 0x1e49
ssvid     : 0x1e49
sn        : ZTA21T0KA2227304LM
mn        : ZHITAI TiPlus5000 1TB
fr        : ZTA09139
[...]
ps    0 : mp:6.50W operational enlat:0 exlat:0 rrt:0 rrl:0
         rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:5.80W operational enlat:0 exlat:0 rrt:1 rrl:1
         rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:3.60W operational enlat:0 exlat:0 rrt:2 rrl:2
         rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0500W non-operational enlat:5000 exlat:10000 rrt:3 rrl:3
         rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0025W non-operational enlat:8000 exlat:45000 rrt:4 rrl:4
         rwt:4 rwl:4 idle_power:- active_power:-
Reported-and-tested-by: Chang Feng <flukehn@gmail.com>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

d5d3c100

nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760 · 80b26240

Abhijit authored Oct 10, 2022

Add a quirk to fix Lexar NM760 SSD drives reporting duplicate nsids.
Signed-off-by: Abhijit <abhijit@abhijittomar.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

80b26240

nvme-tcp: fix possible hang caused during ctrl deletion · c4abd875

Sagi Grimberg authored Sep 28, 2022

When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
	inflight or scheduled (specifically also .stop_ctrl
	which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
	to avoid competing ns addition/removal
3. continue to teardown the controller

However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).

The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
   - err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing (see
stack trace [1]).

Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.

[1]:
--
Call Trace:
 <TASK>
 __schedule+0x390/0x910
 ? scan_shadow_nodes+0x40/0x40
 schedule+0x55/0xe0
 io_schedule+0x16/0x40
 do_read_cache_page+0x55d/0x850
 ? __page_cache_alloc+0x90/0x90
 read_cache_page+0x12/0x20
 read_part_sector+0x3f/0x110
 amiga_partition+0x3d/0x3e0
 ? osf_partition+0x33/0x220
 ? put_partition+0x90/0x90
 bdev_disk_changed+0x1fe/0x4d0
 blkdev_get_whole+0x7b/0x90
 blkdev_get_by_dev+0xda/0x2d0
 device_add_disk+0x356/0x3b0
 nvme_mpath_set_live+0x13c/0x1a0 [nvme_core]
 ? nvme_parse_ana_log+0xae/0x1a0 [nvme_core]
 nvme_update_ns_ana_state+0x3a/0x40 [nvme_core]
 nvme_mpath_add_disk+0x120/0x160 [nvme_core]
 nvme_alloc_ns+0x594/0xa00 [nvme_core]
 nvme_validate_or_alloc_ns+0xb9/0x1a0 [nvme_core]
 ? __nvme_submit_sync_cmd+0x1d2/0x210 [nvme_core]
 nvme_scan_work+0x281/0x410 [nvme_core]
 process_one_work+0x1be/0x380
 worker_thread+0x37/0x3b0
 ? process_one_work+0x380/0x380
 kthread+0x12d/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x1f/0x30
 </TASK>
INFO: task nvme:6725 blocked for more than 491 seconds.
      Not tainted 5.15.65-f0.el7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nvme            state:D
 stack:    0 pid: 6725 ppid:  1761 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x390/0x910
 ? sched_clock+0x9/0x10
 schedule+0x55/0xe0
 schedule_timeout+0x24b/0x2e0
 ? try_to_wake_up+0x358/0x510
 ? finish_task_switch+0x88/0x2c0
 wait_for_completion+0xa5/0x110
 __flush_work+0x144/0x210
 ? worker_attach_to_pool+0xc0/0xc0
 flush_work+0x10/0x20
 nvme_remove_namespaces+0x41/0xf0 [nvme_core]
 nvme_do_delete_ctrl+0x47/0x66 [nvme_core]
 nvme_sysfs_delete.cold.96+0x8/0xd [nvme_core]
 dev_attr_store+0x14/0x30
 sysfs_kf_write+0x38/0x50
 kernfs_fop_write_iter+0x146/0x1d0
 new_sync_write+0x114/0x1b0
 ? intel_pmu_handle_irq+0xe0/0x420
 vfs_write+0x18d/0x270
 ksys_write+0x61/0xe0
 __x64_sys_write+0x1a/0x20
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
--

Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
Reported-by: Jonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Jonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

c4abd875

nvme-rdma: fix possible hang caused during ctrl deletion · a1ae8d4d

Sagi Grimberg authored Sep 28, 2022

When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
        inflight or scheduled (specifically also .stop_ctrl
        which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
        to avoid competing ns addition/removal
3. continue to teardown the controller

However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).

The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
   - err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing.

Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.

Fixes: b435ecea ("nvme: Add .stop_ctrl to nvme ctrl ops")
Fixes: f6c8e432 ("nvme: flush namespace scanning work just before removing namespaces")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

a1ae8d4d

10 Oct, 2022 3 commits

Merge branch 'for-6.1/block' into block-6.1 · 24a40334

Jens Axboe authored Oct 10, 2022

Merge in later fixes.

* for-6.1/block:
  block: fix leaking minors of hidden disks
  block: avoid sign extend problem with default queue flags mask
  blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
  block: Remove the repeat word 'can'
  MAINTAINERS: Update SED-Opal Maintainers

24a40334

block: fix leaking minors of hidden disks · a0a6314a

Christoph Hellwig authored Oct 10, 2022

The major/minor of a hidden gendisk is not propagated to the block
device because it is never registered using bdev_add.  But the lack of
bd_dev also causes the dynamic major minor number not to be freed.
Assign bd_dev manually to ensure the dynamic major minor gets freed.

Based on a patch by Keith Busch.

Fixes: 8ddcd653 ("block: introduce GENHD_FL_HIDDEN")
Reported-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20221010131857.748129-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

a0a6314a

block: avoid sign extend problem with default queue flags mask · ca5eebda

Brian Foster authored Oct 03, 2022

request_queue->queue_flags is unsigned long, which is 8-bytes on
64-bit architectures. Most queue flag modifications occur through
bit field helpers, but default flags can be logically OR'd via the
QUEUE_FLAG_MQ_DEFAULT mask. If this mask happens to include bit 31,
the assignment can sign extend the field and set all upper 32 bits.

This exact problem has been observed on a downstream kernel that
happens to use bit 31 for QUEUE_FLAG_NOWAIT. This is not an
immediate problem for current upstream because bit 31 is not
included in the default flag assignment (and is not used at all,
actually). Regardless, fix up the QUEUE_FLAG_MQ_DEFAULT mask
definition to avoid the landmine in the future.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221003133534.1075582-1-bfoster@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ca5eebda

09 Oct, 2022 11 commits

Merge tag 'ucount-rlimits-cleanups-for-v5.19' of... · 493ffd66

Linus Torvalds authored Oct 09, 2022

Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ucounts update from Eric Biederman:
 "Split rlimit and ucount values and max values

  After the ucount rlimit code was merged a bunch of small but
  siginificant bugs were found and fixed. At the time it was realized
  that part of the problem was that while the ucount rlimits were very
  similar to the oridinary ucounts (in being nested counts with limits)
  the semantics were slightly different and the code would be less error
  prone if there was less sharing.

  This is the long awaited cleanup that should hopefully keep things
  more comprehensible and less error prone for whoever needs to touch
  that code next"

* tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Split rlimit and ucount values and max values

493ffd66

Merge tag 'signal-for-v5.20' of... · e572410e

Linus Torvalds authored Oct 09, 2022

Merge tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace update from Eric Biederman:
 "ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

  Recently I had a conversation where it was pointed out to me that
  SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
  difficult for a tracer to handle.

  Keeping SIGKILL working after the process has been killed is pain from
  an implementation point of view.

  So since the debuggers don't want this behavior let's see if we can
  remove this wart for the userspace API

  If a regression is detected it should only need to be the last change
  that is the reverted. The other two are just general cleanups that
  make the last patch simpler"

* tag 'signal-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Drop signals received after a fatal signal has been processed
  signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
  signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit

e572410e

Merge tag 'retire_mq_sysctls-for-v5.19' of... · 86fb9c53

Linus Torvalds authored Oct 09, 2022

Merge tag 'retire_mq_sysctls-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull mqueue fix from Eric Biederman:
 "A fix for an unlikely but possible memory leak"

* tag 'retire_mq_sysctls-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ipc: mqueue: fix possible memory leak in init_mqueue_fs()

86fb9c53

Merge tag 'interrupting_kthread_stop-for-v5.20' of... · c71370bd

Linus Torvalds authored Oct 09, 2022

Merge tag 'interrupting_kthread_stop-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull kthread update from Eric Biederman:
 "Break out of wait loops on kthread_stop()

  This is a small tweak to kthread_stop so it breaks out of
  interruptible waits, that don't explicitly test for kthread_stop.

  These interruptible waits occassionaly occur in kernel threads do to
  code sharing"

* tag 'interrupting_kthread_stop-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: break out of wait loops on kthread_stop()

c71370bd

Merge tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 4899a36f

Linus Torvalds authored Oct 09, 2022

Pull powerpc updates from Michael Ellerman:

- Remove our now never-true definitions for pgd_huge() and p4d_leaf().

- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.

- Add support for syscall wrappers.

- Add support for KFENCE on 64-bit.

- Update 64-bit HV KVM to use the new guest state entry/exit accounting
API.

- Support execute-only memory when using the Radix MMU (P9 or later).

- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.

- Updates to our linker script to move more data into read-only
sections.

- Allow the VDSO to be randomised on 32-bit.

- Many other small features and fixes.

Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.

* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
KVM: PPC: Book3S HV: Fix stack frame regs marker
powerpc: Don't add __powerpc_ prefix to syscall entry points
powerpc/64s/interrupt: Fix stack frame regs marker
powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
powerpc/pseries: Add firmware details to the hardware description
powerpc/powernv: Add opal details to the hardware description
powerpc: Add device-tree model to the hardware description
powerpc/64: Add logical PVR to the hardware description
powerpc: Add PVR & CPU name to hardware description
powerpc: Add hardware description string
powerpc/configs: Enable PPC_UV in powernv_defconfig
powerpc/configs: Update config files for removed/renamed symbols
powerpc/mm: Fix UBSAN warning reported on hugetlb
powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
powerpc: Drops STABS_DEBUG from linker scripts
powerpc/64s: Remove lost/old comment
powerpc/64s: Remove old STAB comment
powerpc: remove orphan systbl_chk.sh
...

4899a36f

Merge tag 's390-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 03785a69

Linus Torvalds authored Oct 09, 2022

Pull s390 updates from Vasily Gorbik:

 - Make use of the IBM z16 processor activity instrumentation facility
   extension to count neural network processor assist operations: add a
   new PMU device driver so that perf can make use of this.

 - Rework memcpy_real() to avoid DAT-off mode.

 - Rework absolute lowcore access code.

 - Various small fixes and improvements all over the code.

* tag 's390-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: remove unused bus_next field from struct zpci_dev
  s390/cio: remove unused ccw_device_force_console() declaration
  s390/pai: Add support for PAI Extension 1 NNPA counters
  s390/mm: fix no previous prototype warnings in maccess.c
  s390/mm: uninline copy_oldmem_kernel() function
  s390/mm,ptdump: add real memory copy page markers
  s390/mm: rework memcpy_real() to avoid DAT-off mode
  s390/dump: save IPL CPU registers once DAT is available
  s390/pci: convert high_memory to physical address
  s390/smp,ptdump: add absolute lowcore markers
  s390/smp: rework absolute lowcore access
  s390/smp: call smp_reinit_ipl_cpu() before scheduler is available
  s390/ptdump: add missing amode31 markers
  s390/mm: split lowcore pages with set_memory_4k()
  s390/mm: remove unused access parameter from do_fault_error()
  s390/delay: sync comment within __delay() with reality
  s390: move from strlcpy with unused retval to strscpy

03785a69

Merge tag 'riscv-for-linus-6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 2e64066d

Linus Torvalds authored Oct 09, 2022

Pull RISC-V updates from Palmer Dabbelt:

 - Improvements to the CPU topology subsystem, which fix some issues
   where RISC-V would report bad topology information.

 - The default NR_CPUS has increased to XLEN, and the maximum
   configurable value is 512.

 - The CD-ROM filesystems have been enabled in the defconfig.

 - Support for THP_SWAP has been added for rv64 systems.

There are also a handful of cleanups and fixes throughout the tree.

* tag 'riscv-for-linus-6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: enable THP_SWAP for RV64
  RISC-V: Print SSTC in canonical order
  riscv: compat: s/failed/unsupported if compat mode isn't supported
  RISC-V: Increase range and default value of NR_CPUS
  cpuidle: riscv-sbi: Fix CPU_PM_CPU_IDLE_ENTER_xyz() macro usage
  perf: RISC-V: throttle perf events
  perf: RISC-V: exclude invalid pmu counters from SBI calls
  riscv: enable CD-ROM file systems in defconfig
  riscv: topology: fix default topology reporting
  arm64: topology: move store_cpu_topology() to shared code

2e64066d

Merge tag 'microblaze-v6.1' of git://git.monstr.eu/linux-2.6-microblaze · 57c92724

Linus Torvalds authored Oct 09, 2022

Pull microblaze updates from Michal Simek:
 "This adds architecture support for error injection which can be done
  only via local memory (BRAM) with enabling path for recovery after
  reset.

  These patches targets Triple Modular Redundacy (TMR) configuration
  where 3 Microblazes are running in parallel with monitoring logic.

  When an error happens (or is injected) system goes to break handler
  with full CPU reset and system recovery back to origin context. More
  information can be found at [1]"

Link: https://www.xilinx.com/content/dam/xilinx/support/documents/ip_documentation/tmr/v1_0/pg268-tmr.pdf [1]

* tag 'microblaze-v6.1' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Add support for error injection
  microblaze: Add custom break vector handler for mb manager
  microblaze: Add xmb_manager_register function

57c92724

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · ef688f8b

Linus Torvalds authored Oct 09, 2022

Pull kvm updates from Paolo Bonzini:
 "The first batch of KVM patches, mostly covering x86.

  ARM:

   - Account stage2 page table allocations in memory stats

  x86:

   - Account EPT/NPT arm64 page table allocations in memory stats

   - Tracepoint cleanups/fixes for nested VM-Enter and emulated MSR
     accesses

   - Drop eVMCS controls filtering for KVM on Hyper-V, all known
     versions of Hyper-V now support eVMCS fields associated with
     features that are enumerated to the guest

   - Use KVM's sanitized VMCS config as the basis for the values of
     nested VMX capabilities MSRs

   - A myriad event/exception fixes and cleanups. Most notably, pending
     exceptions morph into VM-Exits earlier, as soon as the exception is
     queued, instead of waiting until the next vmentry. This fixed a
     longstanding issue where the exceptions would incorrecly become
     double-faults instead of triggering a vmexit; the common case of
     page-fault vmexits had a special workaround, but now it's fixed for
     good

   - A handful of fixes for memory leaks in error paths

   - Cleanups for VMREAD trampoline and VMX's VM-Exit assembly flow

   - Never write to memory from non-sleepable kvm_vcpu_check_block()

   - Selftests refinements and cleanups

   - Misc typo cleanups

  Generic:

   - remove KVM_REQ_UNHALT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits)
  KVM: remove KVM_REQ_UNHALT
  KVM: mips, x86: do not rely on KVM_REQ_UNHALT
  KVM: x86: never write to memory from kvm_vcpu_check_block()
  KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events
  KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pending
  KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-Enter
  KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is set
  KVM: x86: lapic does not have to process INIT if it is blocked
  KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specific
  KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowed
  KVM: nVMX: Make an event request when pending an MTF nested VM-Exit
  KVM: x86: make vendor code check for all nested events
  mailmap: Update Oliver's email address
  KVM: x86: Allow force_emulation_prefix to be written without a reload
  KVM: selftests: Add an x86-only test to verify nested exception queueing
  KVM: selftests: Use uapi header to get VMX and SVM exit reasons/codes
  KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()
  KVM: VMX: Update MTF and ICEBP comments to document KVM's subtle behavior
  KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions
  KVM: x86: Morph pending exceptions to pending VM-Exits at queue time
  ...

ef688f8b

Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 0e470763

Linus Torvalds authored Oct 09, 2022

Pull EFI updates from Ard Biesheuvel:
 "A bit more going on than usual in the EFI subsystem. The main driver
  for this has been the introduction of the LoonArch architecture last
  cycle, which inspired some cleanup and refactoring of the EFI code.
  Another driver for EFI changes this cycle and in the future is
  confidential compute.

  The LoongArch architecture does not use either struct bootparams or DT
  natively [yet], and so passing information between the EFI stub and
  the core kernel using either of those is undesirable. And in general,
  overloading DT has been a source of issues on arm64, so using DT for
  this on new architectures is a to avoid for the time being (even if we
  might converge on something DT based for non-x86 architectures in the
  future). For this reason, in addition to the patch that enables EFI
  boot for LoongArch, there are a number of refactoring patches applied
  on top of which separate the DT bits from the generic EFI stub bits.
  These changes are on a separate topich branch that has been shared
  with the LoongArch maintainers, who will include it in their pull
  request as well. This is not ideal, but the best way to manage the
  conflicts without stalling LoongArch for another cycle.

  Another development inspired by LoongArch is the newly added support
  for EFI based decompressors. Instead of adding yet another
  arch-specific incarnation of this pattern for LoongArch, we are
  introducing an EFI app based on the existing EFI libstub
  infrastructure that encapulates the decompression code we use on other
  architectures, but in a way that is fully generic. This has been
  developed and tested in collaboration with distro and systemd folks,
  who are eager to start using this for systemd-boot and also for arm64
  secure boot on Fedora. Note that the EFI zimage files this introduces
  can also be decompressed by non-EFI bootloaders if needed, as the
  image header describes the location of the payload inside the image,
  and the type of compression that was used. (Note that Fedora's arm64
  GRUB is buggy [0] so you'll need a recent version or switch to
  systemd-boot in order to use this.)

  Finally, we are adding TPM measurement of the kernel command line
  provided by EFI. There is an oversight in the TCG spec which results
  in a blind spot for command line arguments passed to loaded images,
  which means that either the loader or the stub needs to take the
  measurement. Given the combinatorial explosion I am anticipating when
  it comes to firmware/bootloader stacks and firmware based attestation
  protocols (SEV-SNP, TDX, DICE, DRTM), it is good to set a baseline now
  when it comes to EFI measured boot, which is that the kernel measures
  the initrd and command line. Intermediate loaders can measure
  additional assets if needed, but with the baseline in place, we can
  deploy measured boot in a meaningful way even if you boot into Linux
  straight from the EFI firmware.

  Summary:

   - implement EFI boot support for LoongArch

   - implement generic EFI compressed boot support for arm64, RISC-V and
     LoongArch, none of which implement a decompressor today

   - measure the kernel command line into the TPM if measured boot is in
     effect

   - refactor the EFI stub code in order to isolate DT dependencies for
     architectures other than x86

   - avoid calling SetVirtualAddressMap() on arm64 if the configured
     size of the VA space guarantees that doing so is unnecessary

   - move some ARM specific code out of the generic EFI source files

   - unmap kernel code from the x86 mixed mode 1:1 page tables"

* tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  efi/arm64: libstub: avoid SetVirtualAddressMap() when possible
  efi: zboot: create MemoryMapped() device path for the parent if needed
  efi: libstub: fix up the last remaining open coded boot service call
  efi/arm: libstub: move ARM specific code out of generic routines
  efi/libstub: measure EFI LoadOptions
  efi/libstub: refactor the initrd measuring functions
  efi/loongarch: libstub: remove dependency on flattened DT
  efi: libstub: install boot-time memory map as config table
  efi: libstub: remove DT dependency from generic stub
  efi: libstub: unify initrd loading between architectures
  efi: libstub: remove pointless goto kludge
  efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmap
  efi: libstub: avoid efi_get_memory_map() for allocating the virt map
  efi: libstub: drop pointless get_memory_map() call
  efi: libstub: fix type confusion for load_options_size
  arm64: efi: enable generic EFI compressed boot
  loongarch: efi: enable generic EFI compressed boot
  riscv: efi: enable generic EFI compressed boot
  efi/libstub: implement generic EFI zboot
  efi/libstub: move efi_system_table global var into separate object
  ...

0e470763

blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() · 285febab

Yu Kuai authored Oct 09, 2022

commit 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is
initialized") moves wbt_set_write_cache() before rq_qos_add(), which
is wrong because wbt_rq_qos() is still NULL.

Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc'
directly. Noted that this patch also remove the redundant setting of
'rab->wc'.

Fixes: 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.comSigned-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

285febab

08 Oct, 2022 3 commits

Merge tag 'mailbox-v6.1' of git://git.linaro.org/landing-teams/working/fujitsu/integration · a6afa419

Linus Torvalds authored Oct 08, 2022

Pull mailbox updates from Jassi Brar:

 - apple: implement poll and flush callbacks

 - qcom: fix clocks for IPQ6018 and IPQ8074 irq handler as not-a-thread

 - microchip: split reg-space into two

 - imx: RST channel fix

 - bcm: fix dma_map_sg error handling

 - misc: spelling fix in pcc driver

* tag 'mailbox-v6.1' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: qcom-ipcc: flag IRQ NO_THREAD
  mailbox: pcc: Fix spelling mistake "Plaform" -> "Platform"
  mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
  mailbox: qcom-apcs-ipc: add IPQ8074 APSS clock support
  dt-bindings: mailbox: qcom: correct clocks for IPQ6018 and IPQ8074
  dt-bindings: mailbox: qcom: set correct #clock-cells
  mailbox: mpfs: account for mbox offsets while sending
  mailbox: mpfs: fix handling of the reg property
  dt-bindings: mailbox: fix the mpfs' reg property
  mailbox: imx: fix RST channel support
  mailbox: apple: Implement poll_data() operation
  mailbox: apple: Implement flush() operation

a6afa419

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · bdc753c7

Linus Torvalds authored Oct 08, 2022

Pull clk updates from Stephen Boyd:
 "We have some late breaking reports that a patch series to rework clk
  rate range support broke boot on some devices, so I've left that
  branch out of this. Hopefully we can get to that next week, or punt on
  it and let it bake another cycle. That means we don't really have any
  changes to the core framework this time around besides a few typo
  fixes. Instead this is all clk driver updates and fixes.

  The usual suspects are here (again), with Qualcomm dominating the
  diffstat. We look to have gained support for quite a few new Qualcomm
  SoCs and Dmitry worked on updating many of the existing Qualcomm
  drivers to use clk_parent_data. After that we have MediaTek drivers
  getting some much needed updates, in particular to support GPU DVFS.
  There are also quite a few Samsung clk driver patches, but that's
  mostly because there was a maintainer change and so last release we
  missed some of those patches.

  Overall things look normal, but I'm slowly reviewing core framework
  code nowadays and that shows given the rate range patches had to be
  yanked last minute. Let's hope this situation changes soon.

  New Drivers:
   - Support for Renesas VersaClock7 clock generator family
   - Add Spreadtrum UMS512 SoC clk support
   - New clock drivers for MediaTek Helio X10 MT6795
   - Display clks for Qualcomm SM6115, SM8450
   - GPU clks for Qualcomm SC8280XP
   - Qualcomm MSM8909 and SM6375 global and SMD RPM clk drivers

  Deleted Drivers:
   - Remove DaVinci DM644x and DM646x clk driver support

  Updates:
   - Convert Baikal-T1 CCU driver to platform driver
   - Split reset support out of primary Baikal-T1 CCU driver
   - Add some missing clks required for RPiVid Video Decoder on
     RaspberryPi
   - Mark PLLC critical on bcm2835
   - More devm helpers for fixed rate registration
   - Various PXA168 clk driver fixes
   - Add resets for MediaTek MT8195 PCIe and USB
   - Miscellaneous of_node_put() fixes
   - Nuke dt-bindings/clk path (again) by moving headers to
     dt-bindings/clock
   - Convert gpio-clk-gate binding to YAML
   - Various fixes to AMD/Xilinx Zynqmp clk driver
   - Graduate AMD/Xilinx "clocking wizard" driver from staging
   - Add missing DPI1_HDMI clock in MT8195 VDOSYS1
   - Clock driver changes to support GPU DVFS on MT8183, MT8192, MT8195
   - Fix GPU clock topology on MT8195
   - Propogate rate changes from GPU clock gate up the tree
   - Clock mux notifiers for GPU-related PLLs
   - Conversion of more "simple" drivers to mtk_clk_simple_probe()
   - Hook up mtk_clk_simple_remove() for "simple" MT8192 clock drivers
   - Fixes to previous |struct clk| to |struct clk_hw| conversion on
     MediaTek
   - Shrink MT8192 clock driver by deduplicating clock parent lists
   - Change order between 'sim_enet_root_clk' and 'enet_qos_root_clk'
     clocks for i.MX8MP
   - Drop unnecessary newline in i.MX8MM dt-bindings
   - Add more MU1 and SAI clocks dt-bindings Ids
   - Introduce slice busy bit check for i.MX93 composite clock
   - Introduce white list bit check for i.MX93 composite clock
   - Add new i.MX93 clock gate
   - Add MU1 and MU2 clocks to i.MX93 clock provider
   - Add SAI IPG clocks to i.MX93 clock provider
   - add generic clocks for U(S)ART available on SAMA5D2 SoCs
   - reset controller support for Polarfire clocks
   - .round_rate and .set rate support for clk-mpfs
   - code cleanup for clk-mpfs
   - PLL support for PolarFire SoC's Clock Conditioning Circuitry
   - Add watchdog, I2C, pin control/GPIO, and Ethernet clocks on R-Car
     V4H
   - Add SDHI, Timer (CMT/TMU), and SPI (MSIOF) clocks on R-Car S4-8
   - Add I2C clocks and resets on RZ/V2M
   - Document clock support for the RZ/Five SoC
   - mux-variant clock using the table variant to select parents
   - clock controller for the rv1126 soc
   - conversion of rk3128 to yaml and relicensing of the yaml bindings
     to gpl2+MIT (following dt-binding guildelines)
   - Exynos7885: add FSYS, TREX and MFC clock controllers
   - Exynos850: add IS and AUD (audio) clock controllers with bindings
   - ExynosAutov9: add FSYS clock controllers with bindings
   - ExynosAutov9: correct clock IDs in bindings of Peric 0 and 1 clock
     controllers, due to duplicated entries. This is an acceptable ABI
     break: recently developed/added platform so without legacies, acked
     by known users/developers
   - ExynosAutov9: add few missing Peric 0/1 gates
   - ExynosAutov9: correct register offsets of few Peric 0/1 clocks
   - Minor code improvements (use of_device_get_match_data() helper,
     code style)
   - Add Krzysztof Kozlowski as co-maintainer of Samsung SoC clocks, as
     he already maintainers that architecture/platform
   - Keep Qualcomm GDSCs enabled when PWRSTS_RET flag is there, solving
     retention issues during suspend of USB on Qualcomm sc7180/sc7280
     and SC8280XP
   - Qualcomm SM6115 and QCM2260 are moved to reuse PLL configuration
   - Qualcomm SDM660 SDCC1 moved to floor clk ops
   - Support for the APCS PLLs for Qualcomm IPQ8064, IPQ8074 and IPQ6018
     was added/fixed
   - The Qualcomm MSM8996 CPU clocks are updated with support for ACD
   - Support for Qualcomm SDM670 GCC and RPMh clks was added
   - Transition to parent_data, parent_hws and use of ARRAY_SIZE() for
     num_parents was done for many Qualcomm SoCs
   - Support for per-reset defined delay on Qualcomm was introduced"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (283 commits)
  clk: qcom: gcc-sm6375: Ensure unsigned long type
  clk: qcom: gcc-sm6375: Remove unused variables
  clk: qcom: kpss-xcc: convert to parent data API
  clk: introduce (devm_)hw_register_mux_parent_data_table API
  clk: allow building lan966x as a module
  clk: clk-xgene: simplify if-if to if-else
  clk: ast2600: BCLK comes from EPLL
  clk: clocking-wizard: Depend on HAS_IOMEM
  clk: clocking-wizard: Use dev_err_probe() helper
  clk: nxp: fix typo in comment
  clk: pxa: add a check for the return value of kzalloc()
  clk: vc5: Add support for IDT/Renesas VersaClock 5P49V6975
  dt-bindings: clock: vc5: Add 5P49V6975
  clk: mvebu: armada-37xx-tbg: Remove the unneeded result variable
  clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
  clk: Renesas versaclock7 ccf device driver
  dt-bindings: Renesas versaclock7 device tree bindings
  clk: ti: Balance of_node_get() calls for of_find_node_by_name()
  clk: imx: scu: fix memleak on platform_device_add() fails
  clk: vc5: Use regmap_{set,clear}_bits() where appropriate
  ...

bdc753c7

Merge tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · f0160397

Linus Torvalds authored Oct 08, 2022

Pull gpio updates from Bartosz Golaszewski:
 "We have a single new driver, support for a bunch of new models,
  improvements in drivers and core gpiolib code as well device-tree
  bindings changes.

  Summary:

  New driver:
   - IMX System Controller Unit GPIOs

  GPIO core:
   - add fdinfo output for the GPIO character device file descriptors
     (allows user-space to determine which processes own which GPIO
     lines)
   - improvements to OF GPIO code
   - new quirk for Asus UM325UAZ in gpiolib-acpi
   - new quirk for Freescale SPI in gpiolib-of

  Driver improvements:
   - add a new macro that reduces the amount of boilerplate code in ISA
     drivers and use it in relevant drivers
   - support two new models in gpio-pca953x
   - support new model in gpio-f7188x
   - convert more drivers to use immutable irq chips
   - other minor tweaks

  Device-tree bindings:
   - add DT bindings for gpio-imx-scu
   - convert Xilinx GPIO bindings to YAML
   - reference the properties from the SPI peripheral device-tree
     bindings instead of providing custom ones in the GPIO controller
     document
   - add parsing of GPIO hog nodes to the DT bindings for gpio-mpfs-gpio
   - relax the node name requirements in gpio-stmpe
   - add new models for gpio-rcar and gpio-pxa95xx
   - add a new vendor prefix: Diodes (for Diodes, Inc.)

  Misc:
   - pulled in the immutable branch from the x86 platform drivers tree
     including support for a new simatic board that depends on GPIO
     changes"

* tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (36 commits)
  gpio: tc3589x: Make irqchip immutable
  gpiolib: cdev: add fdinfo output for line request file descriptors
  gpio: twl4030: Reorder functions which allows to drop a forward declaraion
  gpiolib: fix OOB access in quirk callbacks
  gpiolib: of: factor out conversion from OF flags
  gpiolib: rework quirk handling in of_find_gpio()
  gpiolib: of: make Freescale SPI quirk similar to all others
  gpiolib: of: do not ignore requested index when applying quirks
  gpio: ws16c48: Ensure number of irq matches number of base
  gpio: 104-idio-16: Ensure number of irq matches number of base
  gpio: 104-idi-48: Ensure number of irq matches number of base
  gpio: 104-dio-48e: Ensure number of irq matches number of base
  counter: 104-quad-8: Ensure number of irq matches number of base
  isa: Introduce the module_isa_driver_with_irq helper macro
  gpio: pca953x: Add support for PCAL6534
  gpio: pca953x: Swap if statements to save later complexity
  gpio: pca953x: Fix pca953x_gpio_set_pull_up_down()
  dt-bindings: gpio: pca95xx: add entry for pcal6534 and PI4IOE5V6534Q
  dt-bindings: vendor-prefixes: add Diodes
  gpio: mt7621: Switch to use platform_get_irq() function
  ...

f0160397