Commits · b8b9ffced017528bcdd262730ab10bc5084c3bb4 · Kirill Smelkov / linux

11 Feb, 2023 26 commits

Merge branch 'for-6.3/cxl-ram-region' into cxl/next · b8b9ffce

Dan Williams authored Feb 10, 2023

Include the support for enumerating and provisioning ram regions for
v6.3. This also include a default policy change for ram / volatile
device-dax instances to assign them to the dax_kmem driver by default.

b8b9ffce

Merge branch 'for-6.3/cxl' into cxl/next · dfd423e0

Dan Williams authored Feb 10, 2023

Pick up some final miscellaneous updates for v6.3 including support for
communicating 'exclusive' and 'enabled' state of commands.

dfd423e0

cxl/mem: Fix UAPI command comment · af73370d

Ira Weiny authored Feb 02, 2023

The command comment had grammatical errors.  In an attempt to fix those
it was noted that the comment and the query command were not in sync.

Now that the query command returns excluded and device unsupported
command information.  Update the kdoc and fix the grammatical errors.

[1] https://lore.kernel.org/all/63b4ec4e37cc1_5178e2941d@dwillia2-xfh.jf.intel.com.notmuch/Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221222-cxl-misc-v4-4-62f701c1cdd1@intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

af73370d

cxl/uapi: Tag commands from cxl_query_cmd() · 814a15f3

Ira Weiny authored Feb 02, 2023

It was pointed out that commands not supported by the device or excluded
by the kernel were being returned in cxl_query_cmd().[1]

While libcxl correctly handles failing commands, it is more efficient to
not issue an invalid command in the first place.  This can't be done
without additional information being returned from cxl_query_cmd().  In
addition, information about the availability of commands can be useful
for debugging.

Add flags to struct cxl_command_info which reflect if a command is
enabled and/or exclusive to the kernel.

[1] https://lore.kernel.org/all/63b4ec4e37cc1_5178e2941d@dwillia2-xfh.jf.intel.com.notmuch/Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221222-cxl-misc-v4-3-62f701c1cdd1@intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

814a15f3

cxl/uapi: Add warning on CXL command enum · 11ef026e

Ira Weiny authored Feb 02, 2023

The CXL command enum is exported to user space and must maintain
backwards compatibility.

Add comment that new defines must be added to the end of the list.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221222-cxl-misc-v4-2-62f701c1cdd1@intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

11ef026e

cxl/mem: Remove unused CXL_CMD_FLAG_NONE define · 860334e5

Ira Weiny authored Feb 02, 2023

CXL_CMD_FLAG_NONE is not used, remove it.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221222-cxl-misc-v4-1-62f701c1cdd1@intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

860334e5

cxl/dax: Create dax devices for CXL RAM regions · 09d09e04

Dan Williams authored Feb 10, 2023

While platform firmware takes some responsibility for mapping the RAM
capacity of CXL devices present at boot, the OS is responsible for
mapping the remainder and hot-added devices. Platform firmware is also
responsible for identifying the platform general purpose memory pool,
typically DDR attached DRAM, and arranging for the remainder to be 'Soft
Reserved'. That reservation allows the CXL subsystem to route the memory
to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated
access (device-dax).

The new 'struct cxl_dax_region' object allows for a CXL memory resource
(region) to be published, but also allow for udev and module policy to
act on that event. It also prevents cxl_core.ko from having a module
loading dependency on any drivers/dax/ modules.
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

09d09e04

dax: Assign RAM regions to memory-hotplug by default · e9ee9fe3

Dan Williams authored Feb 10, 2023

The default mode for device-dax instances is backwards for RAM-regions
as evidenced by the fact that it tends to catch end users by surprise.
"Where is my memory?". Recall that platforms are increasingly shipping
with performance-differentiated memory pools beyond typical DRAM and
NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic
interleave, varied media types, and future fabric attached
possibilities).

For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux
'Soft Reserved') attribute is expected to be applied to all memory-pools
that are not the general purpose pool. This designation gives an
Operating System a chance to defer usage of a memory pool until later in
the boot process where its performance properties can be interrogated
and administrator policy can be applied.

'Soft Reserved' memory can be anything from too limited and precious to
be part of the general purpose pool (HBM), too slow to host hot kernel
data structures (some PMEM media), or anything in between. However, in
the absence of an explicit policy, the memory should at least be made
usable by default. The current device-dax default hides all
non-general-purpose memory behind a device interface.

The expectation is that the distribution of users that want the memory
online by default vs device-dedicated-access by default follows the
Pareto principle. A small number of enlightened users may want to do
userspace memory management through a device, but general users just
want the kernel to make the memory available with an option to get more
advanced later.

Arrange for all device-dax instances not backed by PMEM to default to
attaching to the dax_kmem driver. From there the baseline memory hotplug
policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=)
gates whether the memory comes online or stays offline. Where, if it
stays offline, it can be reliably converted back to device-mode where it
can be partitioned, or fronted by a userspace allocator.

So, if someone wants device-dax instances for their 'Soft Reserved'
memory:

1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot
   with memhp_default_state=offline, or roll the dice and hope that the
   kernel has not pinned a page in that memory before step 2.

2/ Write a udev rule to convert the target dax device(s) from
   'system-ram' mode to 'devdax' mode:

   daxctl reconfigure-device $dax -m devdax -f

Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

e9ee9fe3

dax/hmem: Move hmem device registration to dax_hmem.ko · 7dab174e

Dan Williams authored Feb 10, 2023

In preparation for the CXL region driver to take over the responsibility
of registering device-dax instances for CXL regions, move the
registration of "hmem" devices to dax_hmem.ko.

Previously the builtin component of this enabling
(drivers/dax/hmem/device.o) would register platform devices for each
address range and trigger the dax_hmem.ko module to load and attach
device-dax instances to those devices. Now, the ranges are collected
from the HMAT and EFI memory map walking, but the device creation is
deferred. A new "hmem_platform" device is created which triggers
dax_hmem.ko to load and register the platform devices.
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167602002771.1924368.5653558226424530127.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

7dab174e

dax/hmem: Convey the dax range via memregion_info() · fe098574

Dan Williams authored Feb 10, 2023

In preparation for hmem platform devices to be unregistered, stop using
platform_device_add_resources() to convey the address range. The
platform_device_add_resources() API causes an existing "Soft Reserved"
iomem resource to be re-parented under an inserted platform device
resource. When that platform device is deleted it removes the platform
device resource and all children.

Instead, it is sufficient to convey just the address range and let
request_mem_region() insert resources to indicate the devices active in
the range. This allows the "Soft Reserved" resource to be re-enumerated
upon the next probe event.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167602002217.1924368.7036275892522551624.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

fe098574

dax/hmem: Drop unnecessary dax_hmem_remove() · 84fe17f8

Dan Williams authored Feb 10, 2023

Empty driver remove callbacks can just be elided.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167602001664.1924368.9102029637928071240.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

84fe17f8

dax/hmem: Move HMAT and Soft reservation probe initcall level · df2798bc

Dan Williams authored Feb 10, 2023

In preparation for moving more filtering of "hmem" ranges into the
dax_hmem.ko module, update the initcall levels. HMAT range registration
moves to subsys_initcall() to be done before Soft Reservation probing,
and Soft Reservation probing is moved to device_initcall() to be done
before dax_hmem.ko initialization if it is built-in.
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167602001107.1924368.11562316181038595611.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

df2798bc

tools/testing/cxl: Define a fixed volatile configuration to parse · 3d8f7cca

Dan Williams authored Feb 10, 2023

Take two endpoints attached to the first switch on the first host-bridge
in the cxl_test topology and define a pre-initialized region. This is a
x2 interleave underneath a x1 CXL Window.

$ modprobe cxl_test
$ # cxl list -Ru
{
  "region":"region3",
  "resource":"0xf010000000",
  "size":"512.00 MiB (536.87 MB)",
  "interleave_ways":2,
  "interleave_granularity":4096,
  "decode_state":"commit"
}
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167602000547.1924368.11613151863880268868.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

3d8f7cca

cxl/region: Add region autodiscovery · a32320b7

Dan Williams authored Feb 10, 2023

Region autodiscovery is an asynchronous state machine advanced by
cxl_port_probe(). After the decoders on an endpoint port are enumerated
they are scanned for actively enabled instances. Each active decoder is
flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region.
If a region does not already exist for the address range setting of the
decoder one is created. That creation process may race with other
decoders of the same region being discovered since cxl_port_probe() is
asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is
introduced to mitigate that race.

Once all decoders have arrived, "p->nr_targets == p->interleave_ways",
they are sorted by their relative decode position. The sort algorithm
involves finding the point in the cxl_port topology where one leg of the
decode leads to deviceA and the other deviceB. At that point in the
topology the target order in the 'struct cxl_switch_decoder' indicates
the relative position of those endpoint decoders in the region.

>From that point the region goes through the same setup and validation
steps as user-created regions, but instead of programming the decoders
it validates that driver would have written the same values to the
decoders as were already present.
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

a32320b7

cxl/port: Split endpoint and switch port probe · 32ce3f18

Dan Williams authored Feb 10, 2023

Jonathan points out that the shared code between the switch and endpoint
case is small. Before adding another is_cxl_endpoint() conditional,
just split the two cases.

Rather than duplicate the "Couldn't enumerate decoders" error message
take the opportunity to improve the error messages in
devm_cxl_enumerate_decoders().
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167601999378.1924368.15071142145866277623.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

32ce3f18

cxl/region: Enable CONFIG_CXL_REGION to be toggled · 45d235c5

Dan Williams authored Feb 10, 2023

Add help text and a label so the CXL_REGION config option can be
toggled. This is mainly to enable compile testing without region
support.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601998765.1924368.258370414771847699.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

45d235c5

kernel/range: Uplevel the cxl subsystem's range_contains() helper · 93c177fd

Dan Williams authored Feb 10, 2023

In support of the CXL subsystem's use of 'struct range' to track decode
address ranges, add a common range_contains() implementation with
identical semantics as resource_contains();

The existing 'range_contains()' in lib/stackinit_kunit.c is namespaced
with a 'stackinit_' prefix.

Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601998163.1924368.6067392174077323935.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

93c177fd

cxl/region: Move region-position validation to a helper · 9995576c

Dan Williams authored Feb 10, 2023

In preparation for region autodiscovery, that needs all devices
discovered before their relative position in the region can be
determined, consolidate all position dependent validation in a helper.

Recall that in the on-demand region creation flow the end-user picks the
position of a given endpoint decoder in a region. In the autodiscovery
case the position of an endpoint decoder can only be determined after
all other endpoint decoders that claim to decode the region's address
range have been enumerated and attached. So, in the autodiscovery case
endpoint decoders may be attached before their relative position is
known. Once all decoders arrive, then positions can be determined and
validated with cxl_region_validate_position() the same as user initiated
on-demand creation.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167601997584.1924368.4615769326126138969.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

9995576c

cxl/region: Cleanup target list on attach error · 86987c76

Dan Williams authored Feb 10, 2023

Jonathan noticed that the target list setup is not unwound completely
upon error. Undo all the setup in the 'err_decrement:' exit path.

Fixes: 27b3f8d1 ("cxl/region: Program target lists")
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Link: http://lore.kernel.org/r/20230208123031.00006990@Huawei.comReviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167601996980.1924368.390423634911157277.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

86987c76

cxl/region: Refactor attach_target() for autodiscovery · 3528b1e1

Dan Williams authored Feb 10, 2023

Region autodiscovery is the process of kernel creating 'struct
cxl_region' object to represent active CXL memory ranges it finds
already active in hardware when the driver loads. Typically this happens
when platform firmware establishes CXL memory regions and then publishes
them in the memory map. However, this can also happen in the case of
kexec-reboot after the kernel has created regions.

In the autodiscovery case the region creation process starts with a
known endpoint decoder. Refactor attach_target() into a helper that is
suitable to be called from either sysfs, for runtime region creation, or
from cxl_port_probe() after it has enumerated all endpoint decoders.

The cxl_port_probe() context is an async device-core probing context, so
it is not appropriate to allow SIGTERM to interrupt the assembly
process. Refactor attach_target() to take @cxled and @state as arguments
where @state indicates whether waiting from the region rwsem is
interruptible or not.

No behavior change is intended.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601996393.1924368.2202255054618600069.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

3528b1e1

cxl/region: Add volatile region creation support · 6e099264

Dan Williams authored Feb 10, 2023

Expand the region creation infrastructure to enable 'ram'
(volatile-memory) regions. The internals of create_pmem_region_store()
and create_pmem_region_show() are factored out into helpers
__create_region() and __create_region_show() for the 'ram' case to
reuse.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601995775.1924368.352616146815830591.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

6e099264

cxl/region: Validate region mode vs decoder mode · 1b9b7a6f

Dan Williams authored Feb 10, 2023

In preparation for a new region mode, do not, for example, allow
'ram' decoders to be assigned to 'pmem' regions and vice versa.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601995111.1924368.7459128614177994602.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

1b9b7a6f

cxl/region: Support empty uuids for non-pmem regions · a8e7d558

Dan Williams authored Feb 10, 2023

Shipping versions of the cxl-cli utility expect all regions to have a
'uuid' attribute. In preparation for 'ram' regions, update the 'uuid'
attribute to return an empty string which satisfies the current
expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the
presence of regions with the 'uuid' attribute missing. Force the
attribute to be read-only as there is no facility or expectation for a
'ram' region to recall its uuid from one boot to the next.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167601994558.1924368.12612811533724694444.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

a8e7d558

cxl/region: Add a mode attribute for regions · 7d505f98

Dan Williams authored Feb 10, 2023

In preparation for a new region type, "ram" regions, add a mode
attribute to clarify the mode of the decoders that can be added to a
region. Share the internals of mode_show() (for decoders) with the
region case.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601993930.1924368.4305018565539515665.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

7d505f98

cxl/Documentation: Update references to attributes added in v6.0 · 8752efd2

Dan Williams authored Feb 10, 2023

Prior to Linus deciding that the kernel that following v5.19 would be
v6.0, the CXL ABI documentation already referenced v5.20. In preparation
for updating these entries update the kernel version to v6.0.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/167601993360.1924368.14122892663883462813.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

8752efd2

cxl/memdev: Fix endpoint port removal · 2345df54

Dan Williams authored Feb 10, 2023

Testing of ram region support [1], stimulates a long standing bug in
cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to
inability to walk ports after dports have been unregistered. That
results in a failure to re-register a memdev after the port is
re-enabled leading to a crash like the following:

    cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256
    general protection fault, ...
    [..]
    RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core]
    dev_name at include/linux/device.h:700
    (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155
    (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249
    [..]
    Call Trace:
     <TASK>
     attach_target+0x39a/0x760 [cxl_core]
     ? __mutex_unlock_slowpath+0x3a/0x290
     cxl_add_to_region+0xb8/0x340 [cxl_core]
     ? lockdep_hardirqs_on+0x7d/0x100
     discover_region+0x4b/0x80 [cxl_port]
     ? __pfx_discover_region+0x10/0x10 [cxl_port]
     device_for_each_child+0x58/0x90
     cxl_port_probe+0x10e/0x130 [cxl_port]
     cxl_bus_probe+0x17/0x50 [cxl_core]

Change the port ancestry walk to be by depth rather than by dport. This
ensures that even if a port has unregistered its dports a deferred
memdev cleanup will still be able to cleanup the memdev's interest in
that port.

The parent_port->dev.driver check is only needed for determining if the
bottom up removal beat the top-down removal, but cxl_ep_remove() can
always proceed given the port is pinned. That is, the two sources of
cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and
cxl_port_release() can not run if cxl_detach_ep() holds a reference.

Fixes: 2703c16c ("cxl/core/port: Add switch port enumeration")
Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1]
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

2345df54

09 Feb, 2023 1 commit

cxl/mem: Correct full ID range allocation · d874297b

Davidlohr Bueso authored Feb 08, 2023

For ID allocations we want 0-(max-1), ie: smatch complains:

error: Calling ida_alloc_range() with a 'max' argument which is a power of 2. -1 missing?

Correct this and also replace the call to use the max() flavor instead.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230208181944.240261-1-dave@stgolabs.netSigned-off-by: Dan Williams <dan.j.williams@intel.com>

d874297b

07 Feb, 2023 4 commits

Merge branch 'for-6.3/cxl-events' into cxl/next · dbe9f7d1
Dan Williams authored Feb 07, 2023
```
Add the CXL event and interrupt support for the v6.3 update.
```
dbe9f7d1

Merge branch 'for-6.3/cxl' into cxl/next · 5485eb95

Dan Williams authored Feb 07, 2023

Merge the general CXL updates with fixes targeting v6.2-rc for v6.3.
Resolve a conflict with the fix and move of cxl_report_and_clear() from
pci.c to core/pci.c.

5485eb95

cxl/region: Fix passthrough-decoder detection · 711442e2

Dan Williams authored Feb 07, 2023

A passthrough decoder is a decoder that maps only 1 target. It is a
special case because it does not impose any constraints on the
interleave-math as compared to a decoder with multiple targets. Extend
the passthrough case to multi-target-capable decoders that only have one
target selected. I.e. the current code was only considering passthrough
*ports* which are only a subset of the potential passthrough decoder
scenarios.

Fixes: e4f6dfa9 ("cxl/region: Fix 'distance' calculation with passthrough ports")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167564540422.847146.13816934143225777888.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

711442e2

cxl/region: Fix null pointer dereference for resetting decoder · 4fa4302d

Fan Ni authored Dec 15, 2022

Not all decoders have a reset callback.

The CXL specification allows a host bridge with a single root port to
have no explicit HDM decoders. Currently the region driver assumes there
are none.  As such the CXL core creates a special pass through decoder
instance without a commit/reset callback.

Prior to this patch, the ->reset() callback was called unconditionally when
calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge,
1 Root Port, and one directly attached CXL type 3 device or multiple CXL
type 3 devices attached to downstream ports of a switch can cause a null
pointer dereference.

Before the fix, a kernel crash was observed when we destroy the region, and
a pass through decoder is reset.

The issue can be reproduced as below,
    1) create a region with a CXL setup which includes a HB with a
    single root port under which a memdev is attached directly.
    2) destroy the region with cxl destroy-region regionX -f.

Fixes: 176baefb ("cxl/hdm: Commit decoder state to hardware")
Cc: <stable@vger.kernel.org>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Link: https://lore.kernel.org/r/20221215170909.2650271-1-fan.ni@samsung.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

4fa4302d

30 Jan, 2023 3 commits

cxl/pci: Fix irq oneshot expectations · 5a84711f

Dan Williams authored Jan 30, 2023

The IRQ core expects that users of the default hardirq handler specify
IRQF_ONESHOT to keep interrupts disabled until the threaded handler
runs. That meets the CXL driver's expectations since it is an edge
triggered MSI and this flag would have been passed by default using
pci_request_irq() instead of devm_request_threaded_irq().

Fixes: a49aa814 ("cxl/mem: Wire up event interrupts")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

5a84711f

cxl/pci: Set the device timestamp · fa884345

Jonathan Cameron authored Jan 30, 2023

CXL r3.0 section 8.2.9.4.2 "Set Timestamp" recommends that the host sets
the timestamp after every Conventional or CXL Reset to ensure accurate
timestamps. This should include on initial boot up. The time base that
is being set is used by a device for the poison list overflow timestamp
and all event timestamps. Note that the command is optional and if
not supported and the device cannot return accurate timestamps it will
fill the fields in with an appropriate marker (see the specification
description of each timestamp).
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20230130151327.32415-1-Jonathan.Cameron@huawei.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

fa884345

cxl/mbox: Add missing parameter to docs. · 7ebf38c9

Jonathan Cameron authored Jan 30, 2023

Kernel-doc should be complete, so add documentation for the status
parameter.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230130153437.3153-1-Jonathan.Cameron@huawei.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>

7ebf38c9

29 Jan, 2023 6 commits

Linux 6.2-rc6 · 6d796c50
Linus Torvalds authored Jan 29, 2023

6d796c50

Merge tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ab072681

Linus Torvalds authored Jan 29, 2023

Pull irq fix from Borislav Petkov:

 - Cleanup the firmware node for the new IRQ MSI domain properly, to
   avoid leaking memory

* tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/msi: Free the fwnode created by msi_create_device_irq_domain()

ab072681

Merge tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bc6bc34b

Linus Torvalds authored Jan 29, 2023

Pull x86 fixes from Borislav Petkov:

 - Start checking for -mindirect-branch-cs-prefix clang support too now
   that LLVM 16 will support it

 - Fix a NULL ptr deref when suspending with Xen PV

 - Have a SEV-SNP guest check explicitly for features enabled by the
   hypervisor and fail gracefully if some are unsupported by the guest
   instead of failing in a non-obvious and hard-to-debug way

 - Fix a MSI descriptor leakage under Xen

 - Mark Xen's MSI domain as supporting MSI-X

 - Prevent legacy PIC interrupts from being resent in software by
   marking them level triggered, as they should be, which lead to a NULL
   ptr deref

* tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block
  acpi: Fix suspend with Xen PV
  x86/sev: Add SEV-SNP guest feature negotiation support
  x86/pci/xen: Fixup fallout from the PCI/MSI overhaul
  x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain
  x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL

bc6bc34b

Merge tag 'input-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 80826e90

Linus Torvalds authored Jan 29, 2023

Pull input fixes from Dmitry Torokhov:

 - touchpads on HP 15-* laptops switched back to PS/2 emulation mode

 - a quirk for Clevo PCX0DX/TUXEDO XP1511 to make sure keyboard is
   responding after resume

* tag 'input-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - add Clevo PCX0DX to i8042 quirk table
  Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode"

80826e90

Merge tag 'cxl-fixes-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 80392928

Linus Torvalds authored Jan 29, 2023

Pull cxl fixes from Dan Williams:
 "A couple of fixes for bugs introduced during the merge window. One is
  a regression, the other was a bug in the CXL AER handler:

   - Fix a crash regression due to module load order of cxl_pmem.ko

   - Fix wrong register offset read in CXL AER handling path"

* tag 'cxl-fixes-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/pmem: Fix nvdimm unregistration when cxl_pmem driver is absent
  cxl: fix cxl_report_and_clear() RAS UE addr mis-assignment

80392928

Revert "mm/compaction: fix set skip in fast_find_migrateblock" · 95e7a450

Vlastimil Babka authored Jan 13, 2023

This reverts commit 7efc3b72.

We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged
stalling CPU for long periods of time.  Investigation of tracepoint data
shows that compaction is stuck in repeating fast_find_migrateblock()
based migrate page isolation, and then fails to migrate all isolated
pages.

Commit 7efc3b72 ("mm/compaction: fix set skip in fast_find_migrateblock")
was suspected as it was merged in 6.1 and in theory can indeed remove a
termination condition for fast_find_migrateblock() under certain
conditions, as it removes a place that always marks a scanned pageblock
from being re-scanned.  There are other such places, but those can be
skipped under certain conditions, which seems to match the tracepoint
data.

Testing of revert also appears to have resolved the issue, thus revert
the commit until a more robust solution for the original problem is
developed.

It's also likely this will fix qemu stalls with 6.1 kernel reported in
Link 2, but that is not yet confirmed.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848
Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/
Link: https://lore.kernel.org/lkml/20230125134434.18017-1-mgorman@techsingularity.net/
Fixes: 7efc3b72 ("mm/compaction: fix set skip in fast_find_migrateblock")
Cc: <stable@vger.kernel.org>
Tested-by: Pedro Falcato <pedro.falcato@gmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

95e7a450