Commit 7c3dc440 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL) updates from Dan Williams:
 "To date Linux has been dependent on platform-firmware to map CXL RAM
  regions and handle events / errors from devices. With this update we
  can now parse / update the CXL memory layout, and report events /
  errors from devices. This is a precursor for the CXL subsystem to
  handle the end-to-end "RAS" flow for CXL memory. i.e. the flow that
  for DDR-attached-DRAM is handled by the EDAC driver where it maps
  system physical address events to a field-replaceable-unit (FRU /
  endpoint device). In general, CXL has the potential to standardize
  what has historically been a pile of memory-controller-specific error
  handling logic.

  Another change of note is the default policy for handling RAM-backed
  device-dax instances. Previously the default access mode was "device",
  mmap(2) a device special file to access memory. The new default is
  "kmem" where the address range is assigned to the core-mm via
  add_memory_driver_managed(). This saves typical users from wondering
  why their platform memory is not visible via free(1) and stuck behind
  a device-file. At the same time it allows expert users to deploy
  policy to, for example, get dedicated access to high performance
  memory, or hide low performance memory from general purpose kernel
  allocations. This affects not only CXL, but also systems with
  high-bandwidth-memory that platform-firmware tags with the
  EFI_MEMORY_SP (special purpose) designation.

  Summary:

   - CXL RAM region enumeration: instantiate 'struct cxl_region' objects
     for platform firmware created memory regions

   - CXL RAM region provisioning: complement the existing PMEM region
     creation support with RAM region support

   - "Soft Reservation" policy change: Online (memory hot-add)
     soft-reserved memory (EFI_MEMORY_SP) by default, but still allow
     for setting aside such memory for dedicated access via device-dax.

   - CXL Events and Interrupts: Takeover CXL event handling from
     platform-firmware (ACPI calls this CXL Memory Error Reporting) and
     export CXL Events via Linux Trace Events.

   - Convey CXL _OSC results to drivers: Similar to PCI, let the CXL
     subsystem interrogate the result of CXL _OSC negotiation.

   - Emulate CXL DVSEC Range Registers as "decoders": Allow for
     first-generation devices that pre-date the definition of the CXL
     HDM Decoder Capability to translate the CXL DVSEC Range Registers
     into 'struct cxl_decoder' objects.

   - Set timestamp: Per spec, set the device timestamp in case of
     hotplug, or if platform-firwmare failed to set it.

   - General fixups: linux-next build issues, non-urgent fixes for
     pre-production hardware, unit test fixes, spelling and debug
     message improvements"

* tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (66 commits)
  dax/kmem: Fix leak of memory-hotplug resources
  cxl/mem: Add kdoc param for event log driver state
  cxl/trace: Add serial number to trace points
  cxl/trace: Add host output to trace points
  cxl/trace: Standardize device information output
  cxl/pci: Remove locked check for dvsec_range_allowed()
  cxl/hdm: Add emulation when HDM decoders are not committed
  cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decoders
  cxl/hdm: Emulate HDM decoder from DVSEC range registers
  cxl/pci: Refactor cxl_hdm_decode_init()
  cxl/port: Export cxl_dvsec_rr_decode() to cxl_port
  cxl/pci: Break out range register decoding from cxl_hdm_decode_init()
  cxl: add RAS status unmasking for CXL
  cxl: remove unnecessary calling of pci_enable_pcie_error_reporting()
  dax/hmem: build hmem device support as module if possible
  dax: cxl: add CXL_REGION dependency
  cxl: avoid returning uninitialized error code
  cxl/pmem: Fix nvdimm registration races
  cxl/mem: Fix UAPI command comment
  cxl/uapi: Tag commands from cxl_query_cmd()
  ...
parents d8e47318 e686c325
......@@ -90,6 +90,21 @@ Description:
capability.
What: /sys/bus/cxl/devices/{port,endpoint}X/parent_dport
Date: January, 2023
KernelVersion: v6.3
Contact: linux-cxl@vger.kernel.org
Description:
(RO) CXL port objects are instantiated for each upstream port in
a CXL/PCIe switch, and for each endpoint to map the
corresponding memory device into the CXL port hierarchy. When a
descendant CXL port (switch or endpoint) is enumerated it is
useful to know which 'dport' object in the parent CXL port
routes to this descendant. The 'parent_dport' symlink points to
the device representing the downstream port of a CXL switch that
routes to {port,endpoint}X.
What: /sys/bus/cxl/devices/portX/dportY
Date: June, 2021
KernelVersion: v5.14
......@@ -183,7 +198,7 @@ Description:
What: /sys/bus/cxl/devices/endpointX/CDAT
Date: July, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RO) If this sysfs entry is not present no DOE mailbox was
......@@ -194,7 +209,7 @@ Description:
What: /sys/bus/cxl/devices/decoderX.Y/mode
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
......@@ -214,7 +229,7 @@ Description:
What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RO) When a CXL decoder is of devtype "cxl_decoder_endpoint",
......@@ -225,7 +240,7 @@ Description:
What: /sys/bus/cxl/devices/decoderX.Y/dpa_size
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
......@@ -245,7 +260,7 @@ Description:
What: /sys/bus/cxl/devices/decoderX.Y/interleave_ways
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RO) The number of targets across which this decoder's host
......@@ -260,7 +275,7 @@ Description:
What: /sys/bus/cxl/devices/decoderX.Y/interleave_granularity
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RO) The number of consecutive bytes of host physical address
......@@ -270,25 +285,25 @@ Description:
interleave_granularity).
What: /sys/bus/cxl/devices/decoderX.Y/create_pmem_region
Date: May, 2022
KernelVersion: v5.20
What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
Date: May, 2022, January, 2023
KernelVersion: v6.0 (pmem), v6.3 (ram)
Contact: linux-cxl@vger.kernel.org
Description:
(RW) Write a string in the form 'regionZ' to start the process
of defining a new persistent memory region (interleave-set)
within the decode range bounded by root decoder 'decoderX.Y'.
The value written must match the current value returned from
reading this attribute. An atomic compare exchange operation is
done on write to assign the requested id to a region and
allocate the region-id for the next creation attempt. EBUSY is
returned if the region name written does not match the current
cached value.
of defining a new persistent, or volatile memory region
(interleave-set) within the decode range bounded by root decoder
'decoderX.Y'. The value written must match the current value
returned from reading this attribute. An atomic compare exchange
operation is done on write to assign the requested id to a
region and allocate the region-id for the next creation attempt.
EBUSY is returned if the region name written does not match the
current cached value.
What: /sys/bus/cxl/devices/decoderX.Y/delete_region
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(WO) Write a string in the form 'regionZ' to delete that region,
......@@ -297,17 +312,18 @@ Description:
What: /sys/bus/cxl/devices/regionZ/uuid
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) Write a unique identifier for the region. This field must
be set for persistent regions and it must not conflict with the
UUID of another region.
UUID of another region. For volatile ram regions this
attribute is a read-only empty string.
What: /sys/bus/cxl/devices/regionZ/interleave_granularity
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) Set the number of consecutive bytes each device in the
......@@ -318,7 +334,7 @@ Description:
What: /sys/bus/cxl/devices/regionZ/interleave_ways
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) Configures the number of devices participating in the
......@@ -328,7 +344,7 @@ Description:
What: /sys/bus/cxl/devices/regionZ/size
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) System physical address space to be consumed by the region.
......@@ -343,9 +359,20 @@ Description:
results in the same address being allocated.
What: /sys/bus/cxl/devices/regionZ/mode
Date: January, 2023
KernelVersion: v6.3
Contact: linux-cxl@vger.kernel.org
Description:
(RO) The mode of a region is established at region creation time
and dictates the mode of the endpoint decoder that comprise the
region. For more details on the possible modes see
/sys/bus/cxl/devices/decoderX.Y/mode
What: /sys/bus/cxl/devices/regionZ/resource
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RO) A region is a contiguous partition of a CXL root decoder
......@@ -357,7 +384,7 @@ Description:
What: /sys/bus/cxl/devices/regionZ/target[0..N]
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) Write an endpoint decoder object name to 'targetX' where X
......@@ -376,7 +403,7 @@ Description:
What: /sys/bus/cxl/devices/regionZ/commit
Date: May, 2022
KernelVersion: v5.20
KernelVersion: v6.0
Contact: linux-cxl@vger.kernel.org
Description:
(RW) Write a boolean 'true' string value to this attribute to
......
......@@ -5912,6 +5912,7 @@ M: Dan Williams <dan.j.williams@intel.com>
M: Vishal Verma <vishal.l.verma@intel.com>
M: Dave Jiang <dave.jiang@intel.com>
L: nvdimm@lists.linux.dev
L: linux-cxl@vger.kernel.org
S: Supported
F: drivers/dax/
......
......@@ -71,7 +71,7 @@ obj-$(CONFIG_FB_INTEL) += video/fbdev/intelfb/
obj-$(CONFIG_PARPORT) += parport/
obj-y += base/ block/ misc/ mfd/ nfc/
obj-$(CONFIG_LIBNVDIMM) += nvdimm/
obj-$(CONFIG_DAX) += dax/
obj-y += dax/
obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
obj-$(CONFIG_NUBUS) += nubus/
obj-y += cxl/
......
......@@ -718,7 +718,7 @@ static void hmat_register_target_devices(struct memory_target *target)
for (res = target->memregions.child; res; res = res->sibling) {
int target_nid = pxm_to_node(target->memory_pxm);
hmem_register_device(target_nid, res);
hmem_register_resource(target_nid, res);
}
}
......@@ -869,4 +869,4 @@ static __init int hmat_init(void)
acpi_put_table(tbl);
return 0;
}
device_initcall(hmat_init);
subsys_initcall(hmat_init);
......@@ -1047,6 +1047,9 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
host_bridge->native_dpc = 0;
if (!(root->osc_ext_control_set & OSC_CXL_ERROR_REPORTING_CONTROL))
host_bridge->native_cxl_error = 0;
/*
* Evaluate the "PCI Boot Configuration" _DSM Function. If it
* exists and returns 0, we must preserve any PCI resource
......
......@@ -104,19 +104,29 @@ config CXL_SUSPEND
depends on SUSPEND && CXL_MEM
config CXL_REGION
bool
bool "CXL: Region Support"
default CXL_BUS
# For MAX_PHYSMEM_BITS
depends on SPARSEMEM
select MEMREGION
select GET_FREE_REGION
help
Enable the CXL core to enumerate and provision CXL regions. A CXL
region is defined by one or more CXL expanders that decode a given
system-physical address range. For CXL regions established by
platform-firmware this option enables memory error handling to
identify the devices participating in a given interleaved memory
range. Otherwise, platform-firmware managed CXL is enabled by being
placed in the system address map and does not need a driver.
If unsure say 'y'
config CXL_REGION_INVALIDATION_TEST
bool "CXL: Region Cache Management Bypass (TEST)"
depends on CXL_REGION
help
CXL Region management and security operations potentially invalidate
the content of CPU caches without notifiying those caches to
the content of CPU caches without notifying those caches to
invalidate the affected cachelines. The CXL Region driver attempts
to invalidate caches when those events occur. If that invalidation
fails the region will fail to enable. Reasons for cache
......
......@@ -19,7 +19,7 @@ struct cxl_cxims_data {
/*
* Find a targets entry (n) in the host bridge interleave list.
* CXL Specfication 3.0 Table 9-22
* CXL Specification 3.0 Table 9-22
*/
static int cxl_xor_calc_n(u64 hpa, struct cxl_cxims_data *cximsd, int iw,
int ig)
......@@ -731,7 +731,8 @@ static void __exit cxl_acpi_exit(void)
cxl_bus_drain();
}
module_init(cxl_acpi_init);
/* load before dax_hmem sees 'Soft Reserved' CXL ranges */
subsys_initcall(cxl_acpi_init);
module_exit(cxl_acpi_exit);
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);
......
......@@ -3,6 +3,8 @@ obj-$(CONFIG_CXL_BUS) += cxl_core.o
obj-$(CONFIG_CXL_SUSPEND) += suspend.o
ccflags-y += -I$(srctree)/drivers/cxl
CFLAGS_trace.o = -DTRACE_INCLUDE_PATH=. -I$(src)
cxl_core-y := port.o
cxl_core-y += pmem.o
cxl_core-y += regs.o
......@@ -10,4 +12,5 @@ cxl_core-y += memdev.o
cxl_core-y += mbox.o
cxl_core-y += pci.o
cxl_core-y += hdm.o
cxl_core-$(CONFIG_TRACING) += trace.o
cxl_core-$(CONFIG_CXL_REGION) += region.o
......@@ -11,15 +11,18 @@ extern struct attribute_group cxl_base_attribute_group;
#ifdef CONFIG_CXL_REGION
extern struct device_attribute dev_attr_create_pmem_region;
extern struct device_attribute dev_attr_create_ram_region;
extern struct device_attribute dev_attr_delete_region;
extern struct device_attribute dev_attr_region;
extern const struct device_type cxl_pmem_region_type;
extern const struct device_type cxl_dax_region_type;
extern const struct device_type cxl_region_type;
void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled);
#define CXL_REGION_ATTR(x) (&dev_attr_##x.attr)
#define CXL_REGION_TYPE(x) (&cxl_region_type)
#define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr),
#define CXL_PMEM_REGION_TYPE(x) (&cxl_pmem_region_type)
#define CXL_DAX_REGION_TYPE(x) (&cxl_dax_region_type)
int cxl_region_init(void);
void cxl_region_exit(void);
#else
......@@ -37,6 +40,7 @@ static inline void cxl_region_exit(void)
#define CXL_REGION_TYPE(x) NULL
#define SET_CXL_REGION_ATTR(x)
#define CXL_PMEM_REGION_TYPE(x) NULL
#define CXL_DAX_REGION_TYPE(x) NULL
#endif
struct cxl_send_command;
......@@ -56,9 +60,6 @@ resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled);
extern struct rw_semaphore cxl_dpa_rwsem;
bool is_switch_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
int cxl_memdev_init(void);
void cxl_memdev_exit(void);
void cxl_mbox_init(void);
......
......@@ -101,11 +101,34 @@ static int map_hdm_decoder_regs(struct cxl_port *port, void __iomem *crb,
BIT(CXL_CM_CAP_CAP_ID_HDM));
}
static struct cxl_hdm *devm_cxl_setup_emulated_hdm(struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info)
{
struct device *dev = &port->dev;
struct cxl_hdm *cxlhdm;
if (!info->mem_enabled)
return ERR_PTR(-ENODEV);
cxlhdm = devm_kzalloc(dev, sizeof(*cxlhdm), GFP_KERNEL);
if (!cxlhdm)
return ERR_PTR(-ENOMEM);
cxlhdm->port = port;
cxlhdm->decoder_count = info->ranges;
cxlhdm->target_count = info->ranges;
dev_set_drvdata(&port->dev, cxlhdm);
return cxlhdm;
}
/**
* devm_cxl_setup_hdm - map HDM decoder component registers
* @port: cxl_port to map
* @info: cached DVSEC range register info
*/
struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info)
{
struct device *dev = &port->dev;
struct cxl_hdm *cxlhdm;
......@@ -119,6 +142,9 @@ struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port)
cxlhdm->port = port;
crb = ioremap(port->component_reg_phys, CXL_COMPONENT_REG_BLOCK_SIZE);
if (!crb) {
if (info && info->mem_enabled)
return devm_cxl_setup_emulated_hdm(port, info);
dev_err(dev, "No component registers mapped\n");
return ERR_PTR(-ENXIO);
}
......@@ -279,7 +305,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
return 0;
}
static int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped)
{
......@@ -295,6 +321,7 @@ static int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_dpa_reserve, CXL);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled)
{
......@@ -676,12 +703,71 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
port->commit_end--;
cxld->flags &= ~CXL_DECODER_F_ENABLE;
/* Userspace is now responsible for reconfiguring this decoder */
if (is_endpoint_decoder(&cxld->dev)) {
struct cxl_endpoint_decoder *cxled;
cxled = to_cxl_endpoint_decoder(&cxld->dev);
cxled->state = CXL_DECODER_STATE_MANUAL;
}
return 0;
}
static int cxl_setup_hdm_decoder_from_dvsec(struct cxl_port *port,
struct cxl_decoder *cxld, int which,
struct cxl_endpoint_dvsec_info *info)
{
if (!is_cxl_endpoint(port))
return -EOPNOTSUPP;
if (!range_len(&info->dvsec_range[which]))
return -ENOENT;
cxld->target_type = CXL_DECODER_EXPANDER;
cxld->commit = NULL;
cxld->reset = NULL;
cxld->hpa_range = info->dvsec_range[which];
/*
* Set the emulated decoder as locked pending additional support to
* change the range registers at run time.
*/
cxld->flags |= CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK;
port->commit_end = cxld->id;
return 0;
}
static bool should_emulate_decoders(struct cxl_port *port)
{
struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
void __iomem *hdm = cxlhdm->regs.hdm_decoder;
u32 ctrl;
int i;
if (!is_cxl_endpoint(cxlhdm->port))
return false;
if (!hdm)
return true;
/*
* If any decoders are committed already, there should not be any
* emulated DVSEC decoders.
*/
for (i = 0; i < cxlhdm->decoder_count; i++) {
ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(i));
if (FIELD_GET(CXL_HDM_DECODER0_CTRL_COMMITTED, ctrl))
return false;
}
return true;
}
static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
int *target_map, void __iomem *hdm, int which,
u64 *dpa_base)
u64 *dpa_base, struct cxl_endpoint_dvsec_info *info)
{
struct cxl_endpoint_decoder *cxled = NULL;
u64 size, base, skip, dpa_size;
......@@ -694,6 +780,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
unsigned char target_id[8];
} target_list;
if (should_emulate_decoders(port))
return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
if (is_endpoint_decoder(&cxld->dev))
cxled = to_cxl_endpoint_decoder(&cxld->dev);
......@@ -717,6 +806,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
.end = base + size - 1,
};
if (cxled && !committed && range_len(&info->dvsec_range[which]))
return cxl_setup_hdm_decoder_from_dvsec(port, cxld, which, info);
/* decoders are enabled if committed */
if (committed) {
cxld->flags |= CXL_DECODER_F_ENABLE;
......@@ -783,21 +875,21 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
return rc;
}
*dpa_base += dpa_size + skip;
cxled->state = CXL_DECODER_STATE_AUTO;
return 0;
}
/**
* devm_cxl_enumerate_decoders - add decoder objects per HDM register set
* @cxlhdm: Structure to populate with HDM capabilities
*/
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
static void cxl_settle_decoders(struct cxl_hdm *cxlhdm)
{
void __iomem *hdm = cxlhdm->regs.hdm_decoder;
struct cxl_port *port = cxlhdm->port;
int i, committed;
u64 dpa_base = 0;
int committed, i;
u32 ctrl;
if (!hdm)
return;
/*
* Since the register resource was recently claimed via request_region()
* be careful about trusting the "not-committed" status until the commit
......@@ -814,6 +906,22 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
/* ensure that future checks of committed can be trusted */
if (committed != cxlhdm->decoder_count)
msleep(20);
}
/**
* devm_cxl_enumerate_decoders - add decoder objects per HDM register set
* @cxlhdm: Structure to populate with HDM capabilities
* @info: cached DVSEC range register info
*/
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info)
{
void __iomem *hdm = cxlhdm->regs.hdm_decoder;
struct cxl_port *port = cxlhdm->port;
int i;
u64 dpa_base = 0;
cxl_settle_decoders(cxlhdm);
for (i = 0; i < cxlhdm->decoder_count; i++) {
int target_map[CXL_DECODER_MAX_INTERLEAVE] = { 0 };
......@@ -826,7 +934,8 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
cxled = cxl_endpoint_decoder_alloc(port);
if (IS_ERR(cxled)) {
dev_warn(&port->dev,
"Failed to allocate the decoder\n");
"Failed to allocate decoder%d.%d\n",
port->id, i);
return PTR_ERR(cxled);
}
cxld = &cxled->cxld;
......@@ -836,21 +945,26 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
cxlsd = cxl_switch_decoder_alloc(port, target_count);
if (IS_ERR(cxlsd)) {
dev_warn(&port->dev,
"Failed to allocate the decoder\n");
"Failed to allocate decoder%d.%d\n",
port->id, i);
return PTR_ERR(cxlsd);
}
cxld = &cxlsd->cxld;
}
rc = init_hdm_decoder(port, cxld, target_map, hdm, i, &dpa_base);
rc = init_hdm_decoder(port, cxld, target_map, hdm, i,
&dpa_base, info);
if (rc) {
dev_warn(&port->dev,
"Failed to initialize decoder%d.%d\n",
port->id, i);
put_device(&cxld->dev);
return rc;
}
rc = add_hdm_decoder(port, cxld, target_map);
if (rc) {
dev_warn(&port->dev,
"Failed to add decoder to port\n");
"Failed to add decoder%d.%d\n", port->id, i);
return rc;
}
}
......
......@@ -3,11 +3,13 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/security.h>
#include <linux/debugfs.h>
#include <linux/ktime.h>
#include <linux/mutex.h>
#include <cxlmem.h>
#include <cxl.h>
#include "core.h"
#include "trace.h"
static bool cxl_raw_allow_all;
......@@ -170,6 +172,12 @@ int cxl_internal_send_cmd(struct cxl_dev_state *cxlds,
out_size = mbox_cmd->size_out;
min_out = mbox_cmd->min_out;
rc = cxlds->mbox_send(cxlds, mbox_cmd);
/*
* EIO is reserved for a payload size mismatch and mbox_send()
* may not return this error.
*/
if (WARN_ONCE(rc == -EIO, "Bad return code: -EIO"))
return -ENXIO;
if (rc)
return rc;
......@@ -445,9 +453,14 @@ int cxl_query_cmd(struct cxl_memdev *cxlmd,
* structures.
*/
cxl_for_each_cmd(cmd) {
const struct cxl_command_info *info = &cmd->info;
struct cxl_command_info info = cmd->info;
if (copy_to_user(&q->commands[j++], info, sizeof(*info)))
if (test_bit(info.id, cxlmd->cxlds->enabled_cmds))
info.flags |= CXL_MEM_COMMAND_FLAG_ENABLED;
if (test_bit(info.id, cxlmd->cxlds->exclusive_cmds))
info.flags |= CXL_MEM_COMMAND_FLAG_EXCLUSIVE;
if (copy_to_user(&q->commands[j++], &info, sizeof(info)))
return -EFAULT;
if (j == n_commands)
......@@ -550,9 +563,9 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s)
return 0;
}
static int cxl_xfer_log(struct cxl_dev_state *cxlds, uuid_t *uuid, u32 size, u8 *out)
static int cxl_xfer_log(struct cxl_dev_state *cxlds, uuid_t *uuid, u32 *size, u8 *out)
{
u32 remaining = size;
u32 remaining = *size;
u32 offset = 0;
while (remaining) {
......@@ -576,6 +589,17 @@ static int cxl_xfer_log(struct cxl_dev_state *cxlds, uuid_t *uuid, u32 size, u8
};
rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
/*
* The output payload length that indicates the number
* of valid bytes can be smaller than the Log buffer
* size.
*/
if (rc == -EIO && mbox_cmd.size_out < xfer_size) {
offset += mbox_cmd.size_out;
break;
}
if (rc < 0)
return rc;
......@@ -584,6 +608,8 @@ static int cxl_xfer_log(struct cxl_dev_state *cxlds, uuid_t *uuid, u32 size, u8
offset += xfer_size;
}
*size = offset;
return 0;
}
......@@ -610,11 +636,12 @@ static void cxl_walk_cel(struct cxl_dev_state *cxlds, size_t size, u8 *cel)
if (!cmd) {
dev_dbg(cxlds->dev,
"Opcode 0x%04x unsupported by driver", opcode);
"Opcode 0x%04x unsupported by driver\n", opcode);
continue;
}
set_bit(cmd->info.id, cxlds->enabled_cmds);
dev_dbg(cxlds->dev, "Opcode 0x%04x enabled\n", opcode);
}
}
......@@ -694,7 +721,7 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
goto out;
}
rc = cxl_xfer_log(cxlds, &uuid, size, log);
rc = cxl_xfer_log(cxlds, &uuid, &size, log);
if (rc) {
kvfree(log);
goto out;
......@@ -717,6 +744,203 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
/*
* General Media Event Record
* CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
*/
static const uuid_t gen_media_event_uuid =
UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
/*
* DRAM Event Record
* CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
*/
static const uuid_t dram_event_uuid =
UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
/*
* Memory Module Event Record
* CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
*/
static const uuid_t mem_mod_event_uuid =
UUID_INIT(0xfe927475, 0xdd59, 0x4339,
0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
uuid_t *id = &record->hdr.id;
if (uuid_equal(id, &gen_media_event_uuid)) {
struct cxl_event_gen_media *rec =
(struct cxl_event_gen_media *)record;
trace_cxl_general_media(cxlmd, type, rec);
} else if (uuid_equal(id, &dram_event_uuid)) {
struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
trace_cxl_dram(cxlmd, type, rec);
} else if (uuid_equal(id, &mem_mod_event_uuid)) {
struct cxl_event_mem_module *rec =
(struct cxl_event_mem_module *)record;
trace_cxl_memory_module(cxlmd, type, rec);
} else {
/* For unknown record types print just the header */
trace_cxl_generic_event(cxlmd, type, record);
}
}
static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
enum cxl_event_log_type log,
struct cxl_get_event_payload *get_pl)
{
struct cxl_mbox_clear_event_payload *payload;
u16 total = le16_to_cpu(get_pl->record_count);
u8 max_handles = CXL_CLEAR_EVENT_MAX_HANDLES;
size_t pl_size = struct_size(payload, handles, max_handles);
struct cxl_mbox_cmd mbox_cmd;
u16 cnt;
int rc = 0;
int i;
/* Payload size may limit the max handles */
if (pl_size > cxlds->payload_size) {
max_handles = (cxlds->payload_size - sizeof(*payload)) /
sizeof(__le16);
pl_size = struct_size(payload, handles, max_handles);
}
payload = kvzalloc(pl_size, GFP_KERNEL);
if (!payload)
return -ENOMEM;
*payload = (struct cxl_mbox_clear_event_payload) {
.event_log = log,
};
mbox_cmd = (struct cxl_mbox_cmd) {
.opcode = CXL_MBOX_OP_CLEAR_EVENT_RECORD,
.payload_in = payload,
.size_in = pl_size,
};
/*
* Clear Event Records uses u8 for the handle cnt while Get Event
* Record can return up to 0xffff records.
*/
i = 0;
for (cnt = 0; cnt < total; cnt++) {
payload->handles[i++] = get_pl->records[cnt].hdr.handle;
dev_dbg(cxlds->dev, "Event log '%d': Clearing %u\n",
log, le16_to_cpu(payload->handles[i]));
if (i == max_handles) {
payload->nr_recs = i;
rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
if (rc)
goto free_pl;
i = 0;
}
}
/* Clear what is left if any */
if (i) {
payload->nr_recs = i;
mbox_cmd.size_in = struct_size(payload, handles, i);
rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
if (rc)
goto free_pl;
}
free_pl:
kvfree(payload);
return rc;
}
static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
enum cxl_event_log_type type)
{
struct cxl_get_event_payload *payload;
struct cxl_mbox_cmd mbox_cmd;
u8 log_type = type;
u16 nr_rec;
mutex_lock(&cxlds->event.log_lock);
payload = cxlds->event.buf;
mbox_cmd = (struct cxl_mbox_cmd) {
.opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
.payload_in = &log_type,
.size_in = sizeof(log_type),
.payload_out = payload,
.size_out = cxlds->payload_size,
.min_out = struct_size(payload, records, 0),
};
do {
int rc, i;
rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
if (rc) {
dev_err_ratelimited(cxlds->dev,
"Event log '%d': Failed to query event records : %d",
type, rc);
break;
}
nr_rec = le16_to_cpu(payload->record_count);
if (!nr_rec)
break;
for (i = 0; i < nr_rec; i++)
cxl_event_trace_record(cxlds->cxlmd, type,
&payload->records[i]);
if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
trace_cxl_overflow(cxlds->cxlmd, type, payload);
rc = cxl_clear_event_record(cxlds, type, payload);
if (rc) {
dev_err_ratelimited(cxlds->dev,
"Event log '%d': Failed to clear events : %d",
type, rc);
break;
}
} while (nr_rec);
mutex_unlock(&cxlds->event.log_lock);
}
/**
* cxl_mem_get_event_records - Get Event Records from the device
* @cxlds: The device data for the operation
* @status: Event Status register value identifying which events are available.
*
* Retrieve all event records available on the device, report them as trace
* events, and clear them.
*
* See CXL rev 3.0 @8.2.9.2.2 Get Event Records
* See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
*/
void cxl_mem_get_event_records(struct cxl_dev_state *cxlds, u32 status)
{
dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
if (status & CXLDEV_EVENT_STATUS_FATAL)
cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
if (status & CXLDEV_EVENT_STATUS_FAIL)
cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
if (status & CXLDEV_EVENT_STATUS_WARN)
cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
if (status & CXLDEV_EVENT_STATUS_INFO)
cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
/**
* cxl_mem_get_partition_info - Get partition info
* @cxlds: The device data for the operation
......@@ -857,6 +1081,32 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL);
int cxl_set_timestamp(struct cxl_dev_state *cxlds)
{
struct cxl_mbox_cmd mbox_cmd;
struct cxl_mbox_set_timestamp_in pi;
int rc;
pi.timestamp = cpu_to_le64(ktime_get_real_ns());
mbox_cmd = (struct cxl_mbox_cmd) {
.opcode = CXL_MBOX_OP_SET_TIMESTAMP,
.size_in = sizeof(pi),
.payload_in = &pi,
};
rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
/*
* Command is optional. Devices may have another way of providing
* a timestamp, or may return all 0s in timestamp fields.
* Don't report an error if this command isn't supported
*/
if (rc && (mbox_cmd.return_code != CXL_MBOX_CMD_RC_UNSUPPORTED))
return rc;
return 0;
}
EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
{
struct cxl_dev_state *cxlds;
......@@ -868,6 +1118,7 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
}
mutex_init(&cxlds->mbox_mutex);
mutex_init(&cxlds->event.log_lock);
cxlds->dev = dev;
return cxlds;
......
......@@ -242,10 +242,11 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
if (!cxlmd)
return ERR_PTR(-ENOMEM);
rc = ida_alloc_range(&cxl_memdev_ida, 0, CXL_MEM_MAX_DEVS, GFP_KERNEL);
rc = ida_alloc_max(&cxl_memdev_ida, CXL_MEM_MAX_DEVS - 1, GFP_KERNEL);
if (rc < 0)
goto err;
cxlmd->id = rc;
cxlmd->depth = -1;
dev = &cxlmd->dev;
device_initialize(dev);
......
This diff is collapsed.
......@@ -46,6 +46,8 @@ static int cxl_device_id(const struct device *dev)
return CXL_DEVICE_NVDIMM;
if (dev->type == CXL_PMEM_REGION_TYPE())
return CXL_DEVICE_PMEM_REGION;
if (dev->type == CXL_DAX_REGION_TYPE())
return CXL_DEVICE_DAX_REGION;
if (is_cxl_port(dev)) {
if (is_cxl_root(to_cxl_port(dev)))
return CXL_DEVICE_ROOT;
......@@ -180,17 +182,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
switch (cxled->mode) {
case CXL_DECODER_RAM:
return sysfs_emit(buf, "ram\n");
case CXL_DECODER_PMEM:
return sysfs_emit(buf, "pmem\n");
case CXL_DECODER_NONE:
return sysfs_emit(buf, "none\n");
case CXL_DECODER_MIXED:
default:
return sysfs_emit(buf, "mixed\n");
}
return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
}
static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
......@@ -304,6 +296,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
&dev_attr_cap_type3.attr,
&dev_attr_target_list.attr,
SET_CXL_REGION_ATTR(create_pmem_region)
SET_CXL_REGION_ATTR(create_ram_region)
SET_CXL_REGION_ATTR(delete_region)
NULL,
};
......@@ -315,6 +308,13 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
return (cxlrd->cxlsd.cxld.flags & flags) == flags;
}
static bool can_create_ram(struct cxl_root_decoder *cxlrd)
{
unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
return (cxlrd->cxlsd.cxld.flags & flags) == flags;
}
static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
......@@ -323,7 +323,11 @@ static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *
if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd))
return 0;
if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd))
if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
return 0;
if (a == CXL_REGION_ATTR(delete_region) &&
!(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
return 0;
return a->mode;
......@@ -444,6 +448,7 @@ bool is_endpoint_decoder(struct device *dev)
{
return dev->type == &cxl_decoder_endpoint_type;
}
EXPORT_SYMBOL_NS_GPL(is_endpoint_decoder, CXL);
bool is_root_decoder(struct device *dev)
{
......@@ -455,6 +460,7 @@ bool is_switch_decoder(struct device *dev)
{
return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
}
EXPORT_SYMBOL_NS_GPL(is_switch_decoder, CXL);
struct cxl_decoder *to_cxl_decoder(struct device *dev)
{
......@@ -482,6 +488,7 @@ struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
return NULL;
return container_of(dev, struct cxl_switch_decoder, cxld.dev);
}
EXPORT_SYMBOL_NS_GPL(to_cxl_switch_decoder, CXL);
static void cxl_ep_release(struct cxl_ep *ep)
{
......@@ -583,6 +590,29 @@ static int devm_cxl_link_uport(struct device *host, struct cxl_port *port)
return devm_add_action_or_reset(host, cxl_unlink_uport, port);
}
static void cxl_unlink_parent_dport(void *_port)
{
struct cxl_port *port = _port;
sysfs_remove_link(&port->dev.kobj, "parent_dport");
}
static int devm_cxl_link_parent_dport(struct device *host,
struct cxl_port *port,
struct cxl_dport *parent_dport)
{
int rc;
if (!parent_dport)
return 0;
rc = sysfs_create_link(&port->dev.kobj, &parent_dport->dport->kobj,
"parent_dport");
if (rc)
return rc;
return devm_add_action_or_reset(host, cxl_unlink_parent_dport, port);
}
static struct lock_class_key cxl_port_key;
static struct cxl_port *cxl_port_alloc(struct device *uport,
......@@ -692,6 +722,10 @@ static struct cxl_port *__devm_cxl_add_port(struct device *host,
if (rc)
return ERR_PTR(rc);
rc = devm_cxl_link_parent_dport(host, port, parent_dport);
if (rc)
return ERR_PTR(rc);
return port;
err:
......@@ -1137,7 +1171,7 @@ static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port,
}
/*
* All users of grandparent() are using it to walk PCIe-like swich port
* All users of grandparent() are using it to walk PCIe-like switch port
* hierarchy. A PCIe switch is comprised of a bridge device representing the
* upstream switch port and N bridges representing downstream switch ports. When
* bridges stack the grand-parent of a downstream switch port is another
......@@ -1164,6 +1198,7 @@ static void delete_endpoint(void *data)
device_lock(parent);
if (parent->driver && !endpoint->dead) {
devm_release_action(parent, cxl_unlink_parent_dport, endpoint);
devm_release_action(parent, cxl_unlink_uport, endpoint);
devm_release_action(parent, unregister_port, endpoint);
}
......@@ -1179,6 +1214,7 @@ int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
get_device(&endpoint->dev);
dev_set_drvdata(dev, endpoint);
cxlmd->depth = endpoint->depth;
return devm_add_action_or_reset(dev, delete_endpoint, cxlmd);
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
......@@ -1194,6 +1230,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
*/
static void delete_switch_port(struct cxl_port *port)
{
devm_release_action(port->dev.parent, cxl_unlink_parent_dport, port);
devm_release_action(port->dev.parent, cxl_unlink_uport, port);
devm_release_action(port->dev.parent, unregister_port, port);
}
......@@ -1212,50 +1249,55 @@ static void reap_dports(struct cxl_port *port)
}
}
struct detach_ctx {
struct cxl_memdev *cxlmd;
int depth;
};
static int port_has_memdev(struct device *dev, const void *data)
{
const struct detach_ctx *ctx = data;
struct cxl_port *port;
if (!is_cxl_port(dev))
return 0;
port = to_cxl_port(dev);
if (port->depth != ctx->depth)
return 0;
return !!cxl_ep_load(port, ctx->cxlmd);
}
static void cxl_detach_ep(void *data)
{
struct cxl_memdev *cxlmd = data;
struct device *iter;
for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) {
struct device *dport_dev = grandparent(iter);
for (int i = cxlmd->depth - 1; i >= 1; i--) {
struct cxl_port *port, *parent_port;
struct detach_ctx ctx = {
.cxlmd = cxlmd,
.depth = i,
};
struct device *dev;
struct cxl_ep *ep;
bool died = false;
if (!dport_dev)
break;
port = find_cxl_port(dport_dev, NULL);
if (!port)
dev = bus_find_device(&cxl_bus_type, NULL, &ctx,
port_has_memdev);
if (!dev)
continue;
if (is_cxl_root(port)) {
put_device(&port->dev);
continue;
}
port = to_cxl_port(dev);
parent_port = to_cxl_port(port->dev.parent);
device_lock(&parent_port->dev);
if (!parent_port->dev.driver) {
/*
* The bottom-up race to delete the port lost to a
* top-down port disable, give up here, because the
* parent_port ->remove() will have cleaned up all
* descendants.
*/
device_unlock(&parent_port->dev);
put_device(&port->dev);
continue;
}
device_lock(&port->dev);
ep = cxl_ep_load(port, cxlmd);
dev_dbg(&cxlmd->dev, "disconnect %s from %s\n",
ep ? dev_name(ep->ep) : "", dev_name(&port->dev));
cxl_ep_remove(port, ep);
if (ep && !port->dead && xa_empty(&port->endpoints) &&
!is_cxl_root(parent_port)) {
!is_cxl_root(parent_port) && parent_port->dev.driver) {
/*
* This was the last ep attached to a dynamically
* enumerated port. Block new cxl_add_ep() and garbage
......@@ -1591,6 +1633,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
}
cxlrd->calc_hb = calc_hb;
mutex_init(&cxlrd->range_lock);
cxld = &cxlsd->cxld;
cxld->dev.type = &cxl_decoder_root_type;
......@@ -1974,6 +2017,6 @@ static void cxl_core_exit(void)
debugfs_remove_recursive(cxl_debugfs);
}
module_init(cxl_core_init);
subsys_initcall(cxl_core_init);
module_exit(cxl_core_exit);
MODULE_LICENSE("GPL v2");
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
#define CREATE_TRACE_POINTS
#include "trace.h"
This diff is collapsed.
......@@ -130,6 +130,7 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
#define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
#define CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
#define CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
#define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
#define CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
#define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
......@@ -140,6 +141,8 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
#define CXL_RAS_HEADER_LOG_OFFSET 0x18
#define CXL_RAS_CAPABILITY_LENGTH 0x58
#define CXL_HEADERLOG_SIZE SZ_512
#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
#define CXLDEV_CAP_ARRAY_OFFSET 0x0
......@@ -154,6 +157,22 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
#define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
/* CXL 3.0 8.2.8.3.1 Event Status Register */
#define CXLDEV_DEV_EVENT_STATUS_OFFSET 0x00
#define CXLDEV_EVENT_STATUS_INFO BIT(0)
#define CXLDEV_EVENT_STATUS_WARN BIT(1)
#define CXLDEV_EVENT_STATUS_FAIL BIT(2)
#define CXLDEV_EVENT_STATUS_FATAL BIT(3)
#define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO | \
CXLDEV_EVENT_STATUS_WARN | \
CXLDEV_EVENT_STATUS_FAIL | \
CXLDEV_EVENT_STATUS_FATAL)
/* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */
#define CXLDEV_EVENT_INT_MODE_MASK GENMASK(1, 0)
#define CXLDEV_EVENT_INT_MSGNUM_MASK GENMASK(7, 4)
/* CXL 2.0 8.2.8.4 Mailbox Registers */
#define CXLDEV_MBOX_CAPS_OFFSET 0x00
#define CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
......@@ -259,6 +278,8 @@ resource_size_t cxl_rcrb_to_component(struct device *dev,
* cxl_decoder flags that define the type of memory / devices this
* decoder supports as well as configuration lock status See "CXL 2.0
* 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
* Additionally indicate whether decoder settings were autodetected,
* user customized.
*/
#define CXL_DECODER_F_RAM BIT(0)
#define CXL_DECODER_F_PMEM BIT(1)
......@@ -318,12 +339,36 @@ enum cxl_decoder_mode {
CXL_DECODER_DEAD,
};
static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
{
static const char * const names[] = {
[CXL_DECODER_NONE] = "none",
[CXL_DECODER_RAM] = "ram",
[CXL_DECODER_PMEM] = "pmem",
[CXL_DECODER_MIXED] = "mixed",
};
if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
return names[mode];
return "mixed";
}
/*
* Track whether this decoder is reserved for region autodiscovery, or
* free for userspace provisioning.
*/
enum cxl_decoder_state {
CXL_DECODER_STATE_MANUAL,
CXL_DECODER_STATE_AUTO,
};
/**
* struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder
* @cxld: base cxl_decoder_object
* @dpa_res: actively claimed DPA span of this decoder
* @skip: offset into @dpa_res where @cxld.hpa_range maps
* @mode: which memory type / access-mode-partition this decoder targets
* @state: autodiscovery state
* @pos: interleave position in @cxld.region
*/
struct cxl_endpoint_decoder {
......@@ -331,6 +376,7 @@ struct cxl_endpoint_decoder {
struct resource *dpa_res;
resource_size_t skip;
enum cxl_decoder_mode mode;
enum cxl_decoder_state state;
int pos;
};
......@@ -364,6 +410,7 @@ typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd,
* @region_id: region id for next region provisioning event
* @calc_hb: which host bridge covers the n'th position by granularity
* @platform_data: platform specific configuration data
* @range_lock: sync region autodiscovery by address range
* @cxlsd: base cxl switch decoder
*/
struct cxl_root_decoder {
......@@ -371,6 +418,7 @@ struct cxl_root_decoder {
atomic_t region_id;
cxl_calc_hb_fn calc_hb;
void *platform_data;
struct mutex range_lock;
struct cxl_switch_decoder cxlsd;
};
......@@ -420,6 +468,13 @@ struct cxl_region_params {
*/
#define CXL_REGION_F_INCOHERENT 0
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
* detection from being added to the region.
*/
#define CXL_REGION_F_AUTO 1
/**
* struct cxl_region - CXL region
* @dev: This region's device
......@@ -475,6 +530,12 @@ struct cxl_pmem_region {
struct cxl_pmem_region_mapping mapping[];
};
struct cxl_dax_region {
struct device dev;
struct cxl_region *cxlr;
struct range hpa_range;
};
/**
* struct cxl_port - logical collection of upstream port devices and
* downstream port devices to construct a CXL memory
......@@ -615,8 +676,10 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
struct cxl_decoder *to_cxl_decoder(struct device *dev);
struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
bool is_root_decoder(struct device *dev);
bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
......@@ -630,10 +693,26 @@ int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld);
int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
/**
* struct cxl_endpoint_dvsec_info - Cached DVSEC info
* @mem_enabled: cached value of mem_enabled in the DVSEC, PCIE_DEVICE
* @ranges: Number of active HDM ranges this device uses.
* @dvsec_range: cached attributes of the ranges in the DVSEC, PCIE_DEVICE
*/
struct cxl_endpoint_dvsec_info {
bool mem_enabled;
int ranges;
struct range dvsec_range[2];
};
struct cxl_hdm;
struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port);
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm);
struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info);
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
int devm_cxl_add_passthrough_decoder(struct cxl_port *port);
int cxl_dvsec_rr_decode(struct device *dev, int dvsec,
struct cxl_endpoint_dvsec_info *info);
bool is_cxl_region(struct device *dev);
......@@ -667,6 +746,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv);
#define CXL_DEVICE_MEMORY_EXPANDER 5
#define CXL_DEVICE_REGION 6
#define CXL_DEVICE_PMEM_REGION 7
#define CXL_DEVICE_DAX_REGION 8
#define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
#define CXL_MODALIAS_FMT "cxl:t%d"
......@@ -683,6 +763,9 @@ struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *dev);
#ifdef CONFIG_CXL_REGION
bool is_cxl_pmem_region(struct device *dev);
struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
int cxl_add_to_region(struct cxl_port *root,
struct cxl_endpoint_decoder *cxled);
struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
#else
static inline bool is_cxl_pmem_region(struct device *dev)
{
......@@ -692,6 +775,15 @@ static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
{
return NULL;
}
static inline int cxl_add_to_region(struct cxl_port *root,
struct cxl_endpoint_decoder *cxled)
{
return 0;
}
static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
{
return NULL;
}
#endif
/*
......
......@@ -4,6 +4,7 @@
#define __CXL_MEM_H__
#include <uapi/linux/cxl_mem.h>
#include <linux/cdev.h>
#include <linux/uuid.h>
#include "cxl.h"
/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
......@@ -38,6 +39,7 @@
* @cxl_nvb: coordinate removal of @cxl_nvd if present
* @cxl_nvd: optional bridge to an nvdimm if the device supports pmem
* @id: id number of this memdev instance.
* @depth: endpoint port depth
*/
struct cxl_memdev {
struct device dev;
......@@ -47,6 +49,7 @@ struct cxl_memdev {
struct cxl_nvdimm_bridge *cxl_nvb;
struct cxl_nvdimm *cxl_nvd;
int id;
int depth;
};
static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
......@@ -79,6 +82,9 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
}
struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
struct cxl_memdev *cxlmd)
......@@ -182,15 +188,31 @@ static inline int cxl_mbox_cmd_rc2errno(struct cxl_mbox_cmd *mbox_cmd)
#define CXL_CAPACITY_MULTIPLIER SZ_256M
/**
* struct cxl_endpoint_dvsec_info - Cached DVSEC info
* @mem_enabled: cached value of mem_enabled in the DVSEC, PCIE_DEVICE
* @ranges: Number of active HDM ranges this device uses.
* @dvsec_range: cached attributes of the ranges in the DVSEC, PCIE_DEVICE
* Event Interrupt Policy
*
* CXL rev 3.0 section 8.2.9.2.4; Table 8-52
*/
enum cxl_event_int_mode {
CXL_INT_NONE = 0x00,
CXL_INT_MSI_MSIX = 0x01,
CXL_INT_FW = 0x02
};
struct cxl_event_interrupt_policy {
u8 info_settings;
u8 warn_settings;
u8 failure_settings;
u8 fatal_settings;
} __packed;
/**
* struct cxl_event_state - Event log driver state
*
* @event_buf: Buffer to receive event data
* @event_log_lock: Serialize event_buf and log use
*/
struct cxl_endpoint_dvsec_info {
bool mem_enabled;
int ranges;
struct range dvsec_range[2];
struct cxl_event_state {
struct cxl_get_event_payload *buf;
struct mutex log_lock;
};
/**
......@@ -228,6 +250,7 @@ struct cxl_endpoint_dvsec_info {
* @info: Cached DVSEC information about the device.
* @serial: PCIe Device Serial Number
* @doe_mbs: PCI DOE mailbox array
* @event: event log driver state
* @mbox_send: @dev specific transport for transmitting mailbox commands
*
* See section 8.2.9.5.2 Capacity Configuration and Label Storage for
......@@ -266,14 +289,21 @@ struct cxl_dev_state {
struct xarray doe_mbs;
struct cxl_event_state event;
int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
};
enum cxl_opcode {
CXL_MBOX_OP_INVALID = 0x0000,
CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
CXL_MBOX_OP_CLEAR_EVENT_RECORD = 0x0101,
CXL_MBOX_OP_GET_EVT_INT_POLICY = 0x0102,
CXL_MBOX_OP_SET_EVT_INT_POLICY = 0x0103,
CXL_MBOX_OP_GET_FW_INFO = 0x0200,
CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
CXL_MBOX_OP_SET_TIMESTAMP = 0x0301,
CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
CXL_MBOX_OP_GET_LOG = 0x0401,
CXL_MBOX_OP_IDENTIFY = 0x4000,
......@@ -347,6 +377,136 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;
/*
* Common Event Record Format
* CXL rev 3.0 section 8.2.9.2.1; Table 8-42
*/
struct cxl_event_record_hdr {
uuid_t id;
u8 length;
u8 flags[3];
__le16 handle;
__le16 related_handle;
__le64 timestamp;
u8 maint_op_class;
u8 reserved[15];
} __packed;
#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
struct cxl_event_record_raw {
struct cxl_event_record_hdr hdr;
u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
} __packed;
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
*/
#define CXL_GET_EVENT_FLAG_OVERFLOW BIT(0)
#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
struct cxl_get_event_payload {
u8 flags;
u8 reserved1;
__le16 overflow_err_count;
__le64 first_overflow_timestamp;
__le64 last_overflow_timestamp;
__le16 record_count;
u8 reserved2[10];
struct cxl_event_record_raw records[];
} __packed;
/*
* CXL rev 3.0 section 8.2.9.2.2; Table 8-49
*/
enum cxl_event_log_type {
CXL_EVENT_TYPE_INFO = 0x00,
CXL_EVENT_TYPE_WARN,
CXL_EVENT_TYPE_FAIL,
CXL_EVENT_TYPE_FATAL,
CXL_EVENT_TYPE_MAX
};
/*
* Clear Event Records input payload
* CXL rev 3.0 section 8.2.9.2.3; Table 8-51
*/
struct cxl_mbox_clear_event_payload {
u8 event_log; /* enum cxl_event_log_type */
u8 clear_flags;
u8 nr_recs;
u8 reserved[3];
__le16 handles[];
} __packed;
#define CXL_CLEAR_EVENT_MAX_HANDLES U8_MAX
/*
* General Media Event Record
* CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
*/
#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
struct cxl_event_gen_media {
struct cxl_event_record_hdr hdr;
__le64 phys_addr;
u8 descriptor;
u8 type;
u8 transaction_type;
u8 validity_flags[2];
u8 channel;
u8 rank;
u8 device[3];
u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
u8 reserved[46];
} __packed;
/*
* DRAM Event Record - DER
* CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
*/
#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
struct cxl_event_dram {
struct cxl_event_record_hdr hdr;
__le64 phys_addr;
u8 descriptor;
u8 type;
u8 transaction_type;
u8 validity_flags[2];
u8 channel;
u8 rank;
u8 nibble_mask[3];
u8 bank_group;
u8 bank;
u8 row[3];
u8 column[2];
u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
u8 reserved[0x17];
} __packed;
/*
* Get Health Info Record
* CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
*/
struct cxl_get_health_info {
u8 health_status;
u8 media_status;
u8 add_status;
u8 life_used;
u8 device_temp[2];
u8 dirty_shutdown_cnt[4];
u8 cor_vol_err_cnt[4];
u8 cor_per_err_cnt[4];
} __packed;
/*
* Memory Module Event Record
* CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
*/
struct cxl_event_mem_module {
struct cxl_event_record_hdr hdr;
u8 event_type;
struct cxl_get_health_info info;
u8 reserved[0x3d];
} __packed;
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
......@@ -372,6 +532,12 @@ struct cxl_mbox_set_partition_info {
#define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)
/* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */
struct cxl_mbox_set_timestamp_in {
__le64 timestamp;
} __packed;
/**
* struct cxl_mem_command - Driver representation of a memory device command
* @info: Command information as it exists for the UAPI
......@@ -393,7 +559,6 @@ struct cxl_mem_command {
struct cxl_command_info info;
enum cxl_opcode opcode;
u32 flags;
#define CXL_CMD_FLAG_NONE 0
#define CXL_CMD_FLAG_FORCE_ENABLE BIT(0)
};
......@@ -441,6 +606,9 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
void cxl_mem_get_event_records(struct cxl_dev_state *cxlds, u32 status);
int cxl_set_timestamp(struct cxl_dev_state *cxlds);
#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
void cxl_mem_active_dec(void);
......
......@@ -53,6 +53,12 @@
#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
/*
* NOTE: Currently all the functions which are enabled for CXL require their
* vectors to be in the first 16. Use this as the default max.
*/
#define CXL_PCI_DEFAULT_MAX_VECTORS 16
/* Register Block Identifier (RBI) */
enum cxl_regloc_type {
CXL_REGLOC_RBI_EMPTY = 0,
......@@ -64,6 +70,10 @@ enum cxl_regloc_type {
int devm_cxl_port_enumerate_dports(struct cxl_port *port);
struct cxl_dev_state;
int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
#endif /* __CXL_PCI_H__ */
This diff is collapsed.
......@@ -76,6 +76,7 @@ static int cxl_nvdimm_probe(struct device *dev)
return rc;
set_bit(NDD_LABELING, &flags);
set_bit(NDD_REGISTER_SYNC, &flags);
set_bit(ND_CMD_GET_CONFIG_SIZE, &cmd_mask);
set_bit(ND_CMD_GET_CONFIG_DATA, &cmd_mask);
set_bit(ND_CMD_SET_CONFIG_DATA, &cmd_mask);
......
......@@ -30,57 +30,116 @@ static void schedule_detach(void *cxlmd)
schedule_cxl_memdev_detach(cxlmd);
}
static int cxl_port_probe(struct device *dev)
static int discover_region(struct device *dev, void *root)
{
struct cxl_endpoint_decoder *cxled;
int rc;
if (!is_endpoint_decoder(dev))
return 0;
cxled = to_cxl_endpoint_decoder(dev);
if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0)
return 0;
if (cxled->state != CXL_DECODER_STATE_AUTO)
return 0;
/*
* Region enumeration is opportunistic, if this add-event fails,
* continue to the next endpoint decoder.
*/
rc = cxl_add_to_region(root, cxled);
if (rc)
dev_dbg(dev, "failed to add to region: %#llx-%#llx\n",
cxled->cxld.hpa_range.start, cxled->cxld.hpa_range.end);
return 0;
}
static int cxl_switch_port_probe(struct cxl_port *port)
{
struct cxl_port *port = to_cxl_port(dev);
struct cxl_hdm *cxlhdm;
int rc;
rc = devm_cxl_port_enumerate_dports(port);
if (rc < 0)
return rc;
if (!is_cxl_endpoint(port)) {
rc = devm_cxl_port_enumerate_dports(port);
if (rc < 0)
return rc;
if (rc == 1)
return devm_cxl_add_passthrough_decoder(port);
}
if (rc == 1)
return devm_cxl_add_passthrough_decoder(port);
cxlhdm = devm_cxl_setup_hdm(port);
cxlhdm = devm_cxl_setup_hdm(port, NULL);
if (IS_ERR(cxlhdm))
return PTR_ERR(cxlhdm);
if (is_cxl_endpoint(port)) {
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
return devm_cxl_enumerate_decoders(cxlhdm, NULL);
}
/* Cache the data early to ensure is_visible() works */
read_cdat_data(port);
static int cxl_endpoint_port_probe(struct cxl_port *port)
{
struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
struct cxl_endpoint_dvsec_info info = { 0 };
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_hdm *cxlhdm;
struct cxl_port *root;
int rc;
get_device(&cxlmd->dev);
rc = devm_add_action_or_reset(dev, schedule_detach, cxlmd);
if (rc)
return rc;
rc = cxl_dvsec_rr_decode(cxlds->dev, cxlds->cxl_dvsec, &info);
if (rc < 0)
return rc;
rc = cxl_hdm_decode_init(cxlds, cxlhdm);
if (rc)
return rc;
cxlhdm = devm_cxl_setup_hdm(port, &info);
if (IS_ERR(cxlhdm))
return PTR_ERR(cxlhdm);
rc = cxl_await_media_ready(cxlds);
if (rc) {
dev_err(dev, "Media not active (%d)\n", rc);
return rc;
}
}
/* Cache the data early to ensure is_visible() works */
read_cdat_data(port);
rc = devm_cxl_enumerate_decoders(cxlhdm);
get_device(&cxlmd->dev);
rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
if (rc)
return rc;
rc = cxl_hdm_decode_init(cxlds, cxlhdm, &info);
if (rc)
return rc;
rc = cxl_await_media_ready(cxlds);
if (rc) {
dev_err(dev, "Couldn't enumerate decoders (%d)\n", rc);
dev_err(&port->dev, "Media not active (%d)\n", rc);
return rc;
}
rc = devm_cxl_enumerate_decoders(cxlhdm, &info);
if (rc)
return rc;
/*
* This can't fail in practice as CXL root exit unregisters all
* descendant ports and that in turn synchronizes with cxl_port_probe()
*/
root = find_cxl_root(&cxlmd->dev);
/*
* Now that all endpoint decoders are successfully enumerated, try to
* assemble regions from committed decoders
*/
device_for_each_child(&port->dev, root, discover_region);
put_device(&root->dev);
return 0;
}
static int cxl_port_probe(struct device *dev)
{
struct cxl_port *port = to_cxl_port(dev);
if (is_cxl_endpoint(port))
return cxl_endpoint_port_probe(port);
return cxl_switch_port_probe(port);
}
static ssize_t CDAT_read(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr, char *buf,
loff_t offset, size_t count)
......
......@@ -44,12 +44,25 @@ config DEV_DAX_HMEM
Say M if unsure.
config DEV_DAX_CXL
tristate "CXL DAX: direct access to CXL RAM regions"
depends on CXL_BUS && CXL_REGION && DEV_DAX
default CXL_REGION && DEV_DAX
help
CXL RAM regions are either mapped by platform-firmware
and published in the initial system-memory map as "System RAM", mapped
by platform-firmware as "Soft Reserved", or dynamically provisioned
after boot by the CXL driver. In the latter two cases a device-dax
instance is created to access that unmapped-by-default address range.
Per usual it can remain as dedicated access via a device interface, or
converted to "System RAM" via the dax_kmem facility.
config DEV_DAX_HMEM_DEVICES
depends on DEV_DAX_HMEM && DAX=y
depends on DEV_DAX_HMEM && DAX
def_bool y
config DEV_DAX_KMEM
tristate "KMEM DAX: volatile-use of persistent memory"
tristate "KMEM DAX: map dax-devices as System-RAM"
default DEV_DAX
depends on DEV_DAX
depends on MEMORY_HOTPLUG # for add_memory() and friends
......
......@@ -3,10 +3,12 @@ obj-$(CONFIG_DAX) += dax.o
obj-$(CONFIG_DEV_DAX) += device_dax.o
obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o
obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o
dax-y := super.o
dax-y += bus.o
device_dax-y := device.o
dax_pmem-y := pmem.o
dax_cxl-y := cxl.o
obj-y += hmem/
......@@ -56,6 +56,25 @@ static int dax_match_id(struct dax_device_driver *dax_drv, struct device *dev)
return match;
}
static int dax_match_type(struct dax_device_driver *dax_drv, struct device *dev)
{
enum dax_driver_type type = DAXDRV_DEVICE_TYPE;
struct dev_dax *dev_dax = to_dev_dax(dev);
if (dev_dax->region->res.flags & IORESOURCE_DAX_KMEM)
type = DAXDRV_KMEM_TYPE;
if (dax_drv->type == type)
return 1;
/* default to device mode if dax_kmem is disabled */
if (dax_drv->type == DAXDRV_DEVICE_TYPE &&
!IS_ENABLED(CONFIG_DEV_DAX_KMEM))
return 1;
return 0;
}
enum id_action {
ID_REMOVE,
ID_ADD,
......@@ -216,14 +235,9 @@ static int dax_bus_match(struct device *dev, struct device_driver *drv)
{
struct dax_device_driver *dax_drv = to_dax_drv(drv);
/*
* All but the 'device-dax' driver, which has 'match_always'
* set, requires an exact id match.
*/
if (dax_drv->match_always)
if (dax_match_id(dax_drv, dev))
return 1;
return dax_match_id(dax_drv, dev);
return dax_match_type(dax_drv, dev);
}
/*
......@@ -427,8 +441,8 @@ static void unregister_dev_dax(void *dev)
dev_dbg(dev, "%s\n", __func__);
kill_dev_dax(dev_dax);
free_dev_dax_ranges(dev_dax);
device_del(dev);
free_dev_dax_ranges(dev_dax);
put_device(dev);
}
......@@ -1413,13 +1427,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
}
EXPORT_SYMBOL_GPL(devm_create_dev_dax);
static int match_always_count;
int __dax_driver_register(struct dax_device_driver *dax_drv,
struct module *module, const char *mod_name)
{
struct device_driver *drv = &dax_drv->drv;
int rc = 0;
/*
* dax_bus_probe() calls dax_drv->probe() unconditionally.
......@@ -1434,26 +1445,7 @@ int __dax_driver_register(struct dax_device_driver *dax_drv,
drv->mod_name = mod_name;
drv->bus = &dax_bus_type;
/* there can only be one default driver */
mutex_lock(&dax_bus_lock);
match_always_count += dax_drv->match_always;
if (match_always_count > 1) {
match_always_count--;
WARN_ON(1);
rc = -EINVAL;
}
mutex_unlock(&dax_bus_lock);
if (rc)
return rc;
rc = driver_register(drv);
if (rc && dax_drv->match_always) {
mutex_lock(&dax_bus_lock);
match_always_count -= dax_drv->match_always;
mutex_unlock(&dax_bus_lock);
}
return rc;
return driver_register(drv);
}
EXPORT_SYMBOL_GPL(__dax_driver_register);
......@@ -1463,7 +1455,6 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv)
struct dax_id *dax_id, *_id;
mutex_lock(&dax_bus_lock);
match_always_count -= dax_drv->match_always;
list_for_each_entry_safe(dax_id, _id, &dax_drv->ids, list) {
list_del(&dax_id->list);
kfree(dax_id);
......
......@@ -11,7 +11,10 @@ struct dax_device;
struct dax_region;
void dax_region_put(struct dax_region *dax_region);
#define IORESOURCE_DAX_STATIC (1UL << 0)
/* dax bus specific ioresource flags */
#define IORESOURCE_DAX_STATIC BIT(0)
#define IORESOURCE_DAX_KMEM BIT(1)
struct dax_region *alloc_dax_region(struct device *parent, int region_id,
struct range *range, int target_node, unsigned int align,
unsigned long flags);
......@@ -25,10 +28,15 @@ struct dev_dax_data {
struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
enum dax_driver_type {
DAXDRV_KMEM_TYPE,
DAXDRV_DEVICE_TYPE,
};
struct dax_device_driver {
struct device_driver drv;
struct list_head ids;
int match_always;
enum dax_driver_type type;
int (*probe)(struct dev_dax *dev);
void (*remove)(struct dev_dax *dev);
};
......
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2023 Intel Corporation. All rights reserved. */
#include <linux/module.h>
#include <linux/dax.h>
#include "../cxl/cxl.h"
#include "bus.h"
static int cxl_dax_region_probe(struct device *dev)
{
struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev);
int nid = phys_to_target_node(cxlr_dax->hpa_range.start);
struct cxl_region *cxlr = cxlr_dax->cxlr;
struct dax_region *dax_region;
struct dev_dax_data data;
struct dev_dax *dev_dax;
if (nid == NUMA_NO_NODE)
nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
PMD_SIZE, IORESOURCE_DAX_KMEM);
if (!dax_region)
return -ENOMEM;
data = (struct dev_dax_data) {
.dax_region = dax_region,
.id = -1,
.size = range_len(&cxlr_dax->hpa_range),
};
dev_dax = devm_create_dev_dax(&data);
if (IS_ERR(dev_dax))
return PTR_ERR(dev_dax);
/* child dev_dax instances now own the lifetime of the dax_region */
dax_region_put(dax_region);
return 0;
}
static struct cxl_driver cxl_dax_region_driver = {
.name = "cxl_dax_region",
.probe = cxl_dax_region_probe,
.id = CXL_DEVICE_DAX_REGION,
.drv = {
.suppress_bind_attrs = true,
},
};
module_cxl_driver(cxl_dax_region_driver);
MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Intel Corporation");
MODULE_IMPORT_NS(CXL);
......@@ -475,8 +475,7 @@ EXPORT_SYMBOL_GPL(dev_dax_probe);
static struct dax_device_driver device_dax_driver = {
.probe = dev_dax_probe,
/* all probe actions are unwound by devm, so .remove isn't necessary */
.match_always = 1,
.type = DAXDRV_DEVICE_TYPE,
};
static int __init dax_init(void)
......
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
# device_hmem.o deliberately precedes dax_hmem.o for initcall ordering
obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device_hmem.o
obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
device_hmem-y := device.o
dax_hmem-y := hmem.o
......@@ -8,6 +8,8 @@
static bool nohmem;
module_param_named(disable, nohmem, bool, 0444);
static bool platform_initialized;
static DEFINE_MUTEX(hmem_resource_lock);
static struct resource hmem_active = {
.name = "HMEM devices",
.start = 0,
......@@ -15,80 +17,66 @@ static struct resource hmem_active = {
.flags = IORESOURCE_MEM,
};
void hmem_register_device(int target_nid, struct resource *r)
int walk_hmem_resources(struct device *host, walk_hmem_fn fn)
{
struct resource *res;
int rc = 0;
mutex_lock(&hmem_resource_lock);
for (res = hmem_active.child; res; res = res->sibling) {
rc = fn(host, (int) res->desc, res);
if (rc)
break;
}
mutex_unlock(&hmem_resource_lock);
return rc;
}
EXPORT_SYMBOL_GPL(walk_hmem_resources);
static void __hmem_register_resource(int target_nid, struct resource *res)
{
/* define a clean / non-busy resource for the platform device */
struct resource res = {
.start = r->start,
.end = r->end,
.flags = IORESOURCE_MEM,
.desc = IORES_DESC_SOFT_RESERVED,
};
struct platform_device *pdev;
struct memregion_info info;
int rc, id;
struct resource *new;
int rc;
if (nohmem)
new = __request_region(&hmem_active, res->start, resource_size(res), "",
0);
if (!new) {
pr_debug("hmem range %pr already active\n", res);
return;
}
rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
IORES_DESC_SOFT_RESERVED);
if (rc != REGION_INTERSECTS)
return;
new->desc = target_nid;
id = memregion_alloc(GFP_KERNEL);
if (id < 0) {
pr_err("memregion allocation failure for %pr\n", &res);
if (platform_initialized)
return;
}
pdev = platform_device_alloc("hmem", id);
pdev = platform_device_alloc("hmem_platform", 0);
if (!pdev) {
pr_err("hmem device allocation failure for %pr\n", &res);
goto out_pdev;
}
if (!__request_region(&hmem_active, res.start, resource_size(&res),
dev_name(&pdev->dev), 0)) {
dev_dbg(&pdev->dev, "hmem range %pr already active\n", &res);
goto out_active;
}
pdev->dev.numa_node = numa_map_to_online_node(target_nid);
info = (struct memregion_info) {
.target_node = target_nid,
};
rc = platform_device_add_data(pdev, &info, sizeof(info));
if (rc < 0) {
pr_err("hmem memregion_info allocation failure for %pr\n", &res);
goto out_resource;
}
rc = platform_device_add_resources(pdev, &res, 1);
if (rc < 0) {
pr_err("hmem resource allocation failure for %pr\n", &res);
goto out_resource;
pr_err_once("failed to register device-dax hmem_platform device\n");
return;
}
rc = platform_device_add(pdev);
if (rc < 0) {
dev_err(&pdev->dev, "device add failed for %pr\n", &res);
goto out_resource;
}
if (rc)
platform_device_put(pdev);
else
platform_initialized = true;
}
return;
void hmem_register_resource(int target_nid, struct resource *res)
{
if (nohmem)
return;
out_resource:
__release_region(&hmem_active, res.start, resource_size(&res));
out_active:
platform_device_put(pdev);
out_pdev:
memregion_free(id);
mutex_lock(&hmem_resource_lock);
__hmem_register_resource(target_nid, res);
mutex_unlock(&hmem_resource_lock);
}
static __init int hmem_register_one(struct resource *res, void *data)
{
hmem_register_device(phys_to_target_node(res->start), res);
hmem_register_resource(phys_to_target_node(res->start), res);
return 0;
}
......@@ -104,4 +92,4 @@ static __init int hmem_init(void)
* As this is a fallback for address ranges unclaimed by the ACPI HMAT
* parsing it must be at an initcall level greater than hmat_init().
*/
late_initcall(hmem_init);
device_initcall(hmem_init);
......@@ -3,6 +3,7 @@
#include <linux/memregion.h>
#include <linux/module.h>
#include <linux/pfn_t.h>
#include <linux/dax.h>
#include "../bus.h"
static bool region_idle;
......@@ -10,30 +11,32 @@ module_param_named(region_idle, region_idle, bool, 0644);
static int dax_hmem_probe(struct platform_device *pdev)
{
unsigned long flags = IORESOURCE_DAX_KMEM;
struct device *dev = &pdev->dev;
struct dax_region *dax_region;
struct memregion_info *mri;
struct dev_dax_data data;
struct dev_dax *dev_dax;
struct resource *res;
struct range range;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
return -ENOMEM;
/*
* @region_idle == true indicates that an administrative agent
* wants to manipulate the range partitioning before the devices
* are created, so do not send them to the dax_kmem driver by
* default.
*/
if (region_idle)
flags = 0;
mri = dev->platform_data;
range.start = res->start;
range.end = res->end;
dax_region = alloc_dax_region(dev, pdev->id, &range, mri->target_node,
PMD_SIZE, 0);
dax_region = alloc_dax_region(dev, pdev->id, &mri->range,
mri->target_node, PMD_SIZE, flags);
if (!dax_region)
return -ENOMEM;
data = (struct dev_dax_data) {
.dax_region = dax_region,
.id = -1,
.size = region_idle ? 0 : resource_size(res),
.size = region_idle ? 0 : range_len(&mri->range),
};
dev_dax = devm_create_dev_dax(&data);
if (IS_ERR(dev_dax))
......@@ -44,22 +47,131 @@ static int dax_hmem_probe(struct platform_device *pdev)
return 0;
}
static int dax_hmem_remove(struct platform_device *pdev)
{
/* devm handles teardown */
return 0;
}
static struct platform_driver dax_hmem_driver = {
.probe = dax_hmem_probe,
.remove = dax_hmem_remove,
.driver = {
.name = "hmem",
},
};
module_platform_driver(dax_hmem_driver);
static void release_memregion(void *data)
{
memregion_free((long) data);
}
static void release_hmem(void *pdev)
{
platform_device_unregister(pdev);
}
static int hmem_register_device(struct device *host, int target_nid,
const struct resource *res)
{
struct platform_device *pdev;
struct memregion_info info;
long id;
int rc;
if (IS_ENABLED(CONFIG_CXL_REGION) &&
region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
IORES_DESC_CXL) != REGION_DISJOINT) {
dev_dbg(host, "deferring range to CXL: %pr\n", res);
return 0;
}
rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
IORES_DESC_SOFT_RESERVED);
if (rc != REGION_INTERSECTS)
return 0;
id = memregion_alloc(GFP_KERNEL);
if (id < 0) {
dev_err(host, "memregion allocation failure for %pr\n", res);
return -ENOMEM;
}
rc = devm_add_action_or_reset(host, release_memregion, (void *) id);
if (rc)
return rc;
pdev = platform_device_alloc("hmem", id);
if (!pdev) {
dev_err(host, "device allocation failure for %pr\n", res);
return -ENOMEM;
}
pdev->dev.numa_node = numa_map_to_online_node(target_nid);
info = (struct memregion_info) {
.target_node = target_nid,
.range = {
.start = res->start,
.end = res->end,
},
};
rc = platform_device_add_data(pdev, &info, sizeof(info));
if (rc < 0) {
dev_err(host, "memregion_info allocation failure for %pr\n",
res);
goto out_put;
}
rc = platform_device_add(pdev);
if (rc < 0) {
dev_err(host, "%s add failed for %pr\n", dev_name(&pdev->dev),
res);
goto out_put;
}
return devm_add_action_or_reset(host, release_hmem, pdev);
out_put:
platform_device_put(pdev);
return rc;
}
static int dax_hmem_platform_probe(struct platform_device *pdev)
{
return walk_hmem_resources(&pdev->dev, hmem_register_device);
}
static struct platform_driver dax_hmem_platform_driver = {
.probe = dax_hmem_platform_probe,
.driver = {
.name = "hmem_platform",
},
};
static __init int dax_hmem_init(void)
{
int rc;
rc = platform_driver_register(&dax_hmem_platform_driver);
if (rc)
return rc;
rc = platform_driver_register(&dax_hmem_driver);
if (rc)
platform_driver_unregister(&dax_hmem_platform_driver);
return rc;
}
static __exit void dax_hmem_exit(void)
{
platform_driver_unregister(&dax_hmem_driver);
platform_driver_unregister(&dax_hmem_platform_driver);
}
module_init(dax_hmem_init);
module_exit(dax_hmem_exit);
/* Allow for CXL to define its own dax regions */
#if IS_ENABLED(CONFIG_CXL_REGION)
#if IS_MODULE(CONFIG_CXL_ACPI)
MODULE_SOFTDEP("pre: cxl_acpi");
#endif
#endif
MODULE_ALIAS("platform:hmem*");
MODULE_ALIAS("platform:hmem_platform*");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Intel Corporation");
......@@ -146,7 +146,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
i, range.start, range.end);
release_resource(res);
remove_resource(res);
kfree(res);
data->res[i] = NULL;
if (mapped)
......@@ -195,7 +195,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
rc = remove_memory(range.start, range_len(&range));
if (rc == 0) {
release_resource(data->res[i]);
remove_resource(data->res[i]);
kfree(data->res[i]);
data->res[i] = NULL;
success++;
......@@ -239,6 +239,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
static struct dax_device_driver device_dax_kmem_driver = {
.probe = dev_dax_kmem_probe,
.remove = dev_dax_kmem_remove,
.type = DAXDRV_KMEM_TYPE,
};
static int __init dax_kmem_init(void)
......
......@@ -508,7 +508,7 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie)
put_device(dev);
}
void nd_device_register(struct device *dev)
static void __nd_device_register(struct device *dev, bool sync)
{
if (!dev)
return;
......@@ -531,11 +531,24 @@ void nd_device_register(struct device *dev)
}
get_device(dev);
async_schedule_dev_domain(nd_async_device_register, dev,
&nd_async_domain);
if (sync)
nd_async_device_register(dev, 0);
else
async_schedule_dev_domain(nd_async_device_register, dev,
&nd_async_domain);
}
void nd_device_register(struct device *dev)
{
__nd_device_register(dev, false);
}
EXPORT_SYMBOL(nd_device_register);
void nd_device_register_sync(struct device *dev)
{
__nd_device_register(dev, true);
}
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
bool killed;
......
......@@ -624,7 +624,10 @@ struct nvdimm *__nvdimm_create(struct nvdimm_bus *nvdimm_bus,
nvdimm->sec.ext_flags = nvdimm_security_flags(nvdimm, NVDIMM_MASTER);
device_initialize(dev);
lockdep_set_class(&dev->mutex, &nvdimm_key);
nd_device_register(dev);
if (test_bit(NDD_REGISTER_SYNC, &flags))
nd_device_register_sync(dev);
else
nd_device_register(dev);
return nvdimm;
}
......
......@@ -107,6 +107,7 @@ int nvdimm_bus_create_ndctl(struct nvdimm_bus *nvdimm_bus);
void nvdimm_bus_destroy_ndctl(struct nvdimm_bus *nvdimm_bus);
void nd_synchronize(void);
void nd_device_register(struct device *dev);
void nd_device_register_sync(struct device *dev);
struct nd_label_id;
char *nd_label_gen_id(struct nd_label_id *label_id, const uuid_t *uuid,
u32 flags);
......
......@@ -596,6 +596,7 @@ static void pci_init_host_bridge(struct pci_host_bridge *bridge)
bridge->native_ltr = 1;
bridge->native_dpc = 1;
bridge->domain_nr = PCI_DOMAIN_NR_NOT_SET;
bridge->native_cxl_error = 1;
device_initialize(&bridge->dev);
}
......
......@@ -262,11 +262,14 @@ static inline bool dax_mapping(struct address_space *mapping)
}
#ifdef CONFIG_DEV_DAX_HMEM_DEVICES
void hmem_register_device(int target_nid, struct resource *r);
void hmem_register_resource(int target_nid, struct resource *r);
#else
static inline void hmem_register_device(int target_nid, struct resource *r)
static inline void hmem_register_resource(int target_nid, struct resource *r)
{
}
#endif
typedef int (*walk_hmem_fn)(struct device *dev, int target_nid,
const struct resource *res);
int walk_hmem_resources(struct device *dev, walk_hmem_fn fn);
#endif
......@@ -41,6 +41,9 @@ enum {
*/
NDD_INCOHERENT = 7,
/* dimm provider wants synchronous registration by __nvdimm_create() */
NDD_REGISTER_SYNC = 8,
/* need to set a limit somewhere, but yes, this is likely overkill */
ND_IOCTL_MAX_BUFLEN = SZ_4M,
ND_CMD_MAX_ELEM = 5,
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment