Commit ee96dd96 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'libnvdimm-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The update for this cycle includes the deprecation of block-aperture
  mode and a new perf events interface for the papr_scm nvdimm driver.

  The perf events approach was acked by PeterZ.

   - Add perf support for nvdimm events, initially only for 'papr_scm'
     devices.

   - Deprecate the 'block aperture' support in libnvdimm, it only ever
     existed in the specification, not in shipping product"

* tag 'libnvdimm-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nvdimm/blk: Fix title level
  MAINTAINERS: remove section LIBNVDIMM BLK: MMIO-APERTURE DRIVER
  powerpc/papr_scm: Fix build failure when
  drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set
  nvdimm/region: Delete nd_blk_region infrastructure
  ACPI: NFIT: Remove block aperture support
  nvdimm/namespace: Delete nd_namespace_blk
  nvdimm/namespace: Delete blk namespace consideration in shared paths
  nvdimm/blk: Delete the block-aperture window driver
  nvdimm/region: Fix default alignment for small regions
  docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for nvdimm pmu
  powerpc/papr_scm: Add perf interface support
  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  drivers/nvdimm: Add nvdimm pmu structure
parents d888c83f ada8d8d3
......@@ -6,3 +6,38 @@ Description:
The libnvdimm sub-system implements a common sysfs interface for
platform nvdimm resources. See Documentation/driver-api/nvdimm/.
What: /sys/bus/event_source/devices/nmemX/format
Date: February 2022
KernelVersion: 5.18
Contact: Kajol Jain <kjain@linux.ibm.com>
Description: (RO) Attribute group to describe the magic bits
that go into perf_event_attr.config for a particular pmu.
(See ABI/testing/sysfs-bus-event_source-devices-format).
Each attribute under this group defines a bit range of the
perf_event_attr.config. Supported attribute is listed
below::
event = "config:0-4" - event ID
For example::
ctl_res_cnt = "event=0x1"
What: /sys/bus/event_source/devices/nmemX/events
Date: February 2022
KernelVersion: 5.18
Contact: Kajol Jain <kjain@linux.ibm.com>
Description: (RO) Attribute group to describe performance monitoring events
for the nvdimm memory device. Each attribute in this group
describes a single performance monitoring event supported by
this nvdimm pmu. The name of the file is the name of the event.
(See ABI/testing/sysfs-bus-event_source-devices-events). A
listing of the events supported by a given nvdimm provider type
can be found in Documentation/driver-api/nvdimm/$provider.
What: /sys/bus/event_source/devices/nmemX/cpumask
Date: February 2022
KernelVersion: 5.18
Contact: Kajol Jain <kjain@linux.ibm.com>
Description: (RO) This sysfs file exposes the cpumask which is designated to
to retrieve nvdimm pmu event counter data.
......@@ -14,10 +14,8 @@ Version 13
Overview
Supporting Documents
Git Trees
LIBNVDIMM PMEM and BLK
Why BLK?
PMEM vs BLK
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
LIBNVDIMM PMEM
PMEM-REGIONs, Atomic Sectors, and DAX
Example NVDIMM Platform
LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
LIBNDCTL: Context
......@@ -53,19 +51,12 @@ PMEM:
block device composed of PMEM is capable of DAX. A PMEM address range
may span an interleave of several DIMMs.
BLK:
A set of one or more programmable memory mapped apertures provided
by a DIMM to access its media. This indirection precludes the
performance benefit of interleaving, but enables DIMM-bounded failure
modes.
DPA:
DIMM Physical Address, is a DIMM-relative offset. With one DIMM in
the system there would be a 1:1 system-physical-address:DPA association.
Once more DIMMs are added a memory controller interleave must be
decoded to determine the DPA associated with a given
system-physical-address. BLK capacity always has a 1:1 relationship
with a single-DIMM's DPA range.
system-physical-address.
DAX:
File system extensions to bypass the page cache and block layer to
......@@ -84,30 +75,30 @@ BTT:
Block Translation Table: Persistent memory is byte addressable.
Existing software may have an expectation that the power-fail-atomicity
of writes is at least one sector, 512 bytes. The BTT is an indirection
table with atomic update semantics to front a PMEM/BLK block device
table with atomic update semantics to front a PMEM block device
driver and present arbitrary atomic sector sizes.
LABEL:
Metadata stored on a DIMM device that partitions and identifies
(persistently names) storage between PMEM and BLK. It also partitions
BLK storage to host BTTs with different parameters per BLK-partition.
Note that traditional partition tables, GPT/MBR, are layered on top of a
BLK or PMEM device.
(persistently names) capacity allocated to different PMEM namespaces. It
also indicates whether an address abstraction like a BTT is applied to
the namepsace. Note that traditional partition tables, GPT/MBR, are
layered on top of a PMEM namespace, or an address abstraction like BTT
if present, but partition support is deprecated going forward.
Overview
========
The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
and BLK mode access. These three modes of operation are described by
the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6. While the LIBNVDIMM
implementation is generic and supports pre-NFIT platforms, it was guided
by the superset of capabilities need to support this ACPI 6 definition
for NVDIMM resources. The bulk of the kernel implementation is in place
to handle the case where DPA accessible via PMEM is aliased with DPA
accessible via BLK. When that occurs a LABEL is needed to reserve DPA
for exclusive access via one mode a time.
The LIBNVDIMM subsystem provides support for PMEM described by platform
firmware or a device driver. On ACPI based systems the platform firmware
conveys persistent memory resource via the ACPI NFIT "NVDIMM Firmware
Interface Table" in ACPI 6. While the LIBNVDIMM subsystem implementation
is generic and supports pre-NFIT platforms, it was guided by the
superset of capabilities need to support this ACPI 6 definition for
NVDIMM resources. The original implementation supported the
block-window-aperture capability described in the NFIT, but that support
has since been abandoned and never shipped in a product.
Supporting Documents
--------------------
......@@ -125,107 +116,38 @@ Git Trees
---------
LIBNVDIMM:
https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
https://git.kernel.org/cgit/linux/kernel/git/nvdimm/nvdimm.git
LIBNDCTL:
https://github.com/pmem/ndctl.git
PMEM:
https://github.com/01org/prd
LIBNVDIMM PMEM and BLK
======================
LIBNVDIMM PMEM
==============
Prior to the arrival of the NFIT, non-volatile memory was described to a
system in various ad-hoc ways. Usually only the bare minimum was
provided, namely, a single system-physical-address range where writes
are expected to be durable after a system power loss. Now, the NFIT
specification standardizes not only the description of PMEM, but also
BLK and platform message-passing entry points for control and
configuration.
For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
device driver:
1. PMEM (nd_pmem.ko): Drives a system-physical-address range. This
range is contiguous in system memory and may be interleaved (hardware
memory controller striped) across multiple DIMMs. When interleaved the
platform may optionally provide details of which DIMMs are participating
in the interleave.
Note that while LIBNVDIMM describes system-physical-address ranges that may
alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
distinction. The different device-types are an implementation detail
that userspace can exploit to implement policies like "only interface
with address ranges from certain DIMMs". It is worth noting that when
aliasing is present and a DIMM lacks a label, then no block device can
be created by default as userspace needs to do at least one allocation
of DPA to the PMEM range. In contrast ND_NAMESPACE_IO ranges, once
registered, can be immediately attached to nd_pmem.
2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
defined apertures. A set of apertures will access just one DIMM.
Multiple windows (apertures) allow multiple concurrent accesses, much like
tagged-command-queuing, and would likely be used by different threads or
different CPUs.
The NFIT specification defines a standard format for a BLK-aperture, but
the spec also allows for vendor specific layouts, and non-NFIT BLK
implementations may have other designs for BLK I/O. For this reason
"nd_blk" calls back into platform-specific code to perform the I/O.
One such implementation is defined in the "Driver Writer's Guide" and "DSM
Interface Example".
Why BLK?
========
platform message-passing entry points for control and configuration.
PMEM (nd_pmem.ko): Drives a system-physical-address range. This range is
contiguous in system memory and may be interleaved (hardware memory controller
striped) across multiple DIMMs. When interleaved the platform may optionally
provide details of which DIMMs are participating in the interleave.
It is worth noting that when the labeling capability is detected (a EFI
namespace label index block is found), then no block device is created
by default as userspace needs to do at least one allocation of DPA to
the PMEM range. In contrast ND_NAMESPACE_IO ranges, once registered,
can be immediately attached to nd_pmem. This latter mode is called
label-less or "legacy".
PMEM-REGIONs, Atomic Sectors, and DAX
-------------------------------------
While PMEM provides direct byte-addressable CPU-load/store access to
NVDIMM storage, it does not provide the best system RAS (recovery,
availability, and serviceability) model. An access to a corrupted
system-physical-address address causes a CPU exception while an access
to a corrupted address through an BLK-aperture causes that block window
to raise an error status in a register. The latter is more aligned with
the standard error model that host-bus-adapter attached disks present.
Also, if an administrator ever wants to replace a memory it is easier to
service a system at DIMM module boundaries. Compare this to PMEM where
data could be interleaved in an opaque hardware specific manner across
several DIMMs.
PMEM vs BLK
-----------
BLK-apertures solve these RAS problems, but their presence is also the
major contributing factor to the complexity of the ND subsystem. They
complicate the implementation because PMEM and BLK alias in DPA space.
Any given DIMM's DPA-range may contribute to one or more
system-physical-address sets of interleaved DIMMs, *and* may also be
accessed in its entirety through its BLK-aperture. Accessing a DPA
through a system-physical-address while simultaneously accessing the
same DPA through a BLK-aperture has undefined results. For this reason,
DIMMs with this dual interface configuration include a DSM function to
store/retrieve a LABEL. The LABEL effectively partitions the DPA-space
into exclusive system-physical-address and BLK-aperture accessible
regions. For simplicity a DIMM is allowed a PMEM "region" per each
interleave set in which it is a member. The remaining DPA space can be
carved into an arbitrary number of BLK devices with discontiguous
extents.
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
One of the few
reasons to allow multiple BLK namespaces per REGION is so that each
BLK-namespace can be configured with a BTT with unique atomic sector
sizes. While a PMEM device can host a BTT the LABEL specification does
not provide for a sector size to be specified for a PMEM namespace.
This is due to the expectation that the primary usage model for PMEM is
via DAX, and the BTT is incompatible with DAX. However, for the cases
where an application or filesystem still needs atomic sector update
guarantees it can register a BTT on a PMEM device or partition. See
For the cases where an application or filesystem still needs atomic sector
update guarantees it can register a BTT on a PMEM device or partition. See
LIBNVDIMM/NDCTL: Block Translation Table "btt"
......@@ -236,51 +158,40 @@ For the remainder of this document the following diagram will be
referenced for any example sysfs layouts::
(a) (b) DIMM BLK-REGION
(a) (b) DIMM
+-------------------+--------+--------+--------+
+------+ | pm0.0 | blk2.0 | pm1.0 | blk2.1 | 0 region2
+------+ | pm0.0 | free | pm1.0 | free | 0
| imc0 +--+- - - region0- - - +--------+ +--------+
+--+---+ | pm0.0 | blk3.0 | pm1.0 | blk3.1 | 1 region3
+--+---+ | pm0.0 | free | pm1.0 | free | 1
| +-------------------+--------v v--------+
+--+---+ | |
| cpu0 | region1
+--+---+ | |
| +----------------------------^ ^--------+
+--+---+ | blk4.0 | pm1.0 | blk4.0 | 2 region4
+--+---+ | free | pm1.0 | free | 2
| imc1 +--+----------------------------| +--------+
+------+ | blk5.0 | pm1.0 | blk5.0 | 3 region5
+------+ | free | pm1.0 | free | 3
+----------------------------+--------+--------+
In this platform we have four DIMMs and two memory controllers in one
socket. Each unique interface (BLK or PMEM) to DPA space is identified
by a region device with a dynamically assigned id (REGION0 - REGION5).
socket. Each PMEM interleave set is identified by a region device with
a dynamically assigned id.
1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
single PMEM namespace is created in the REGION0-SPA-range that spans most
of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
interleaved system-physical-address range is reclaimed as BLK-aperture
accessed space starting at DPA-offset (a) into each DIMM. In that
reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
REGION3 where "blk2.0" and "blk3.0" are just human readable names that
could be set to any user-desired name in the LABEL.
interleaved system-physical-address range is left free for
another PMEM namespace to be defined.
2. In the last portion of DIMM0 and DIMM1 we have an interleaved
system-physical-address range, REGION1, that spans those two DIMMs as
well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace
named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
"blk5.0".
3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
interleaved system-physical-address range (i.e. the DPA address past
offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
Note, that this example shows that BLK-aperture namespaces don't need to
be contiguous in DPA-space.
named "pm1.0".
This bus is provided by the kernel under the device
/sys/devices/platform/nfit_test.0 when the nfit_test.ko module from
tools/testing/nvdimm is loaded. This not only test LIBNVDIMM but the
acpi_nfit.ko driver as well.
tools/testing/nvdimm is loaded. This module is a unit test for
LIBNVDIMM and the acpi_nfit.ko driver.
LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
......@@ -469,17 +380,14 @@ identified by an "nfit_handle" a 32-bit value where:
LIBNVDIMM/LIBNDCTL: Region
--------------------------
A generic REGION device is registered for each PMEM range or BLK-aperture
set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
sets on the "nfit_test.0" bus. The primary role of regions are to be a
container of "mappings". A mapping is a tuple of <DIMM,
DPA-start-offset, length>.
A generic REGION device is registered for each PMEM interleave-set /
range. Per the example there are 2 PMEM regions on the "nfit_test.0"
bus. The primary role of regions are to be a container of "mappings". A
mapping is a tuple of <DIMM, DPA-start-offset, length>.
LIBNVDIMM provides a built-in driver for these REGION devices. This driver
is responsible for reconciling the aliased DPA mappings across all
regions, parsing the LABEL, if present, and then emitting NAMESPACE
devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
nd_blk device driver to consume.
LIBNVDIMM provides a built-in driver for REGION devices. This driver
is responsible for all parsing LABELs, if present, and then emitting NAMESPACE
devices for the nd_pmem driver to consume.
In addition to the generic attributes of "mapping"s, "interleave_ways"
and "size" the REGION device also exports some convenience attributes.
......@@ -493,8 +401,6 @@ LIBNVDIMM: region::
struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc);
struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc);
::
......@@ -527,8 +433,9 @@ LIBNDCTL: region enumeration example
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sample region retrieval routines based on NFIT-unique data like
"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
BLK::
"spa_index" (interleave set id).
::
static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
unsigned int spa_index)
......@@ -544,139 +451,23 @@ BLK::
return NULL;
}
static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
unsigned int handle)
{
struct ndctl_region *region;
ndctl_region_foreach(bus, region) {
struct ndctl_mapping *map;
if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
continue;
ndctl_mapping_foreach(region, map) {
struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
if (ndctl_dimm_get_handle(dimm) == handle)
return region;
}
}
return NULL;
}
Why Not Encode the Region Type into the Region Name?
----------------------------------------------------
At first glance it seems since NFIT defines just PMEM and BLK interface
types that we should simply name REGION devices with something derived
from those type names. However, the ND subsystem explicitly keeps the
REGION name generic and expects userspace to always consider the
region-attributes for four reasons:
1. There are already more than two REGION and "namespace" types. For
PMEM there are two subtypes. As mentioned previously we have PMEM where
the constituent DIMM devices are known and anonymous PMEM. For BLK
regions the NFIT specification already anticipates vendor specific
implementations. The exact distinction of what a region contains is in
the region-attributes not the region-name or the region-devtype.
2. A region with zero child-namespaces is a possible configuration. For
example, the NFIT allows for a DCR to be published without a
corresponding BLK-aperture. This equates to a DIMM that can only accept
control/configuration messages, but no i/o through a descendant block
device. Again, this "type" is advertised in the attributes ('mappings'
== 0) and the name does not tell you much.
3. What if a third major interface type arises in the future? Outside
of vendor specific implementations, it's not difficult to envision a
third class of interface type beyond BLK and PMEM. With a generic name
for the REGION level of the device-hierarchy old userspace
implementations can still make sense of new kernel advertised
region-types. Userspace can always rely on the generic region
attributes like "mappings", "size", etc and the expected child devices
named "namespace". This generic format of the device-model hierarchy
allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
future-proof.
4. There are more robust mechanisms for determining the major type of a
region than a device name. See the next section, How Do I Determine the
Major Type of a Region?
How Do I Determine the Major Type of a Region?
----------------------------------------------
Outside of the blanket recommendation of "use libndctl", or simply
looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
"nstype" integer attribute, here are some other options.
1. module alias lookup
^^^^^^^^^^^^^^^^^^^^^^
The whole point of region/namespace device type differentiation is to
decide which block-device driver will attach to a given LIBNVDIMM namespace.
One can simply use the modalias to lookup the resulting module. It's
important to note that this method is robust in the presence of a
vendor-specific driver down the road. If a vendor-specific
implementation wants to supplant the standard nd_blk driver it can with
minimal impact to the rest of LIBNVDIMM.
In fact, a vendor may also want to have a vendor-specific region-driver
(outside of nd_region). For example, if a vendor defined its own LABEL
format it would need its own region driver to parse that LABEL and emit
the resulting namespaces. The output from module resolution is more
accurate than a region-name or region-devtype.
2. udev
^^^^^^^
The kernel "devtype" is registered in the udev database::
# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
P: /devices/platform/nfit_test.0/ndbus0/region0
E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
E: DEVTYPE=nd_pmem
E: MODALIAS=nd:t2
E: SUBSYSTEM=nd
# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
P: /devices/platform/nfit_test.0/ndbus0/region4
E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
E: DEVTYPE=nd_blk
E: MODALIAS=nd:t3
E: SUBSYSTEM=nd
...and is available as a region attribute, but keep in mind that the
"devtype" does not indicate sub-type variations and scripts should
really be understanding the other attributes.
3. type specific attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^
As it currently stands a BLK-aperture region will never have a
"nfit/spa_index" attribute, but neither will a non-NFIT PMEM region. A
BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
that does not allow I/O. A PMEM region with a "mappings" value of zero
is a simple system-physical-address range.
LIBNVDIMM/LIBNDCTL: Namespace
-----------------------------
A REGION, after resolving DPA aliasing and LABEL specified boundaries,
surfaces one or more "namespace" devices. The arrival of a "namespace"
device currently triggers either the nd_blk or nd_pmem driver to load
and register a disk/block device.
A REGION, after resolving DPA aliasing and LABEL specified boundaries, surfaces
one or more "namespace" devices. The arrival of a "namespace" device currently
triggers the nd_pmem driver to load and register a disk/block device.
LIBNVDIMM: namespace
^^^^^^^^^^^^^^^^^^^^
Here is a sample layout from the three major types of NAMESPACE where
namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
attribute), namespace2.0 represents a BLK namespace (note it has a
'sector_size' attribute) that, and namespace6.0 represents an anonymous
PMEM namespace (note that has no 'uuid' attribute due to not support a
LABEL)::
Here is a sample layout from the 2 major types of NAMESPACE where namespace0.0
represents DIMM-info-backed PMEM (note that it has a 'uuid' attribute), and
namespace1.0 represents an anonymous PMEM namespace (note that has no 'uuid'
attribute due to not support a LABEL)
::
/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
|-- alt_name
......@@ -691,20 +482,7 @@ LABEL)::
|-- type
|-- uevent
`-- uuid
/sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
|-- alt_name
|-- devtype
|-- dpa_extents
|-- force_raw
|-- modalias
|-- numa_node
|-- sector_size
|-- size
|-- subsystem -> ../../../../../../bus/nd
|-- type
|-- uevent
`-- uuid
/sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
/sys/devices/platform/nfit_test.1/ndbus1/region1/namespace1.0
|-- block
| `-- pmem0
|-- devtype
......@@ -786,9 +564,9 @@ Why the Term "namespace"?
LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
-------------------------------------------------
A BTT (design document: https://pmem.io/2014/09/23/btt.html) is a stacked
block device driver that fronts either the whole block device or a
partition of a block device emitted by either a PMEM or BLK NAMESPACE.
A BTT (design document: https://pmem.io/2014/09/23/btt.html) is a
personality driver for a namespace that fronts entire namespace as an
'address abstraction'.
LIBNVDIMM: btt layout
^^^^^^^^^^^^^^^^^^^^^
......@@ -815,7 +593,9 @@ LIBNDCTL: btt creation example
Similar to namespaces an idle BTT device is automatically created per
region. Each time this "seed" btt device is configured and enabled a new
seed is created. Creating a BTT configuration involves two steps of
finding and idle BTT and assigning it to consume a PMEM or BLK namespace::
finding and idle BTT and assigning it to consume a namespace.
::
static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
{
......@@ -863,25 +643,15 @@ For the given example above, here is the view of the objects as seen by the
LIBNDCTL API::
+---+
|CTX| +---------+ +--------------+ +---------------+
+-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
| | +---------+ +--------------+ +---------------+
+-------+ | | +---------+ +--------------+ +---------------+
| DIMM0 <-+ | +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
+-------+ | | | +---------+ +--------------+ +---------------+
|CTX|
+-+-+
|
+-------+ |
| DIMM0 <-+ | +---------+ +--------------+ +---------------+
+-------+ | | +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
| DIMM1 <-+ +-v--+ | +---------+ +--------------+ +---------------+
+-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6 "blk2.0" |
| DIMM2 <-+ +----+ | +---------+ | +--------------+ +----------------------+
+-------+ | | +-> NAMESPACE2.1 +--> ND5 "blk2.1" | BTT2 |
| DIMM3 <-+ | +--------------+ +----------------------+
+-------+ | +---------+ +--------------+ +---------------+
+-> REGION3 +-+-> NAMESPACE3.0 +--> ND4 "blk3.0" |
| +---------+ | +--------------+ +----------------------+
| +-> NAMESPACE3.1 +--> ND3 "blk3.1" | BTT1 |
| +--------------+ +----------------------+
| +---------+ +--------------+ +---------------+
+-> REGION4 +---> NAMESPACE4.0 +--> ND2 "blk4.0" |
| +---------+ +--------------+ +---------------+
| +---------+ +--------------+ +----------------------+
+-> REGION5 +---> NAMESPACE5.0 +--> ND1 "blk5.0" | BTT0 |
+---------+ +--------------+ +---------------+------+
+-------+ +-+BUS0+-| +---------+ +--------------+ +----------------------+
| DIMM2 <-+ +----+ +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" | BTT1 |
+-------+ | | +---------+ +--------------+ +---------------+------+
| DIMM3 <-+
+-------+
......@@ -11121,17 +11121,6 @@ F: drivers/ata/
F: include/linux/ata.h
F: include/linux/libata.h
LIBNVDIMM BLK: MMIO-APERTURE DRIVER
M: Dan Williams <dan.j.williams@intel.com>
M: Vishal Verma <vishal.l.verma@intel.com>
M: Dave Jiang <dave.jiang@intel.com>
L: nvdimm@lists.linux.dev
S: Supported
Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
P: Documentation/nvdimm/maintainer-entry-profile.rst
F: drivers/nvdimm/blk.c
F: drivers/nvdimm/region_devs.c
LIBNVDIMM BTT: BLOCK TRANSLATION TABLE
M: Vishal Verma <vishal.l.verma@intel.com>
M: Dan Williams <dan.j.williams@intel.com>
......
......@@ -48,6 +48,11 @@ struct dev_archdata {
struct pdev_archdata {
u64 dma_mask;
/*
* Pointer to nvdimm_pmu structure, to handle the unregistering
* of pmu device
*/
void *priv;
};
#endif /* _ASM_POWERPC_DEVICE_H */
......@@ -19,6 +19,7 @@
#include <asm/papr_pdsm.h>
#include <asm/mce.h>
#include <asm/unaligned.h>
#include <linux/perf_event.h>
#define BIND_ANY_ADDR (~0ul)
......@@ -124,6 +125,8 @@ struct papr_scm_priv {
/* The bits which needs to be overridden */
u64 health_bitmap_inject_mask;
/* array to have event_code and stat_id mappings */
char **nvdimm_events_map;
};
static int papr_scm_pmem_flush(struct nd_region *nd_region,
......@@ -344,6 +347,225 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
return 0;
}
#ifdef CONFIG_PERF_EVENTS
#define to_nvdimm_pmu(_pmu) container_of(_pmu, struct nvdimm_pmu, pmu)
static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
{
struct papr_scm_perf_stat *stat;
struct papr_scm_perf_stats *stats;
struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
int rc, size;
/* Allocate request buffer enough to hold single performance stat */
size = sizeof(struct papr_scm_perf_stats) +
sizeof(struct papr_scm_perf_stat);
if (!p || !p->nvdimm_events_map)
return -EINVAL;
stats = kzalloc(size, GFP_KERNEL);
if (!stats)
return -ENOMEM;
stat = &stats->scm_statistic[0];
memcpy(&stat->stat_id,
p->nvdimm_events_map[event->attr.config],
sizeof(stat->stat_id));
stat->stat_val = 0;
rc = drc_pmem_query_stats(p, stats, 1);
if (rc < 0) {
kfree(stats);
return rc;
}
*count = be64_to_cpu(stat->stat_val);
kfree(stats);
return 0;
}
static int papr_scm_pmu_event_init(struct perf_event *event)
{
struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
struct papr_scm_priv *p;
if (!nd_pmu)
return -EINVAL;
/* test the event attr type for PMU enumeration */
if (event->attr.type != event->pmu->type)
return -ENOENT;
/* it does not support event sampling mode */
if (is_sampling_event(event))
return -EOPNOTSUPP;
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
if (!p)
return -EINVAL;
/* Invalid eventcode */
if (event->attr.config == 0 || event->attr.config > 16)
return -EINVAL;
return 0;
}
static int papr_scm_pmu_add(struct perf_event *event, int flags)
{
u64 count;
int rc;
struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
if (!nd_pmu)
return -EINVAL;
if (flags & PERF_EF_START) {
rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
if (rc)
return rc;
local64_set(&event->hw.prev_count, count);
}
return 0;
}
static void papr_scm_pmu_read(struct perf_event *event)
{
u64 prev, now;
int rc;
struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
if (!nd_pmu)
return;
rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
if (rc)
return;
prev = local64_xchg(&event->hw.prev_count, now);
local64_add(now - prev, &event->count);
}
static void papr_scm_pmu_del(struct perf_event *event, int flags)
{
papr_scm_pmu_read(event);
}
static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
{
struct papr_scm_perf_stat *stat;
struct papr_scm_perf_stats *stats;
char *statid;
int index, rc, count;
u32 available_events;
if (!p->stat_buffer_len)
return -ENOENT;
available_events = (p->stat_buffer_len - sizeof(struct papr_scm_perf_stats))
/ sizeof(struct papr_scm_perf_stat);
/* Allocate the buffer for phyp where stats are written */
stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
if (!stats) {
rc = -ENOMEM;
return rc;
}
/* Allocate memory to nvdimm_event_map */
p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
if (!p->nvdimm_events_map) {
rc = -ENOMEM;
goto out_stats;
}
/* Called to get list of events supported */
rc = drc_pmem_query_stats(p, stats, 0);
if (rc)
goto out_nvdimm_events_map;
for (index = 0, stat = stats->scm_statistic, count = 0;
index < available_events; index++, ++stat) {
statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
if (!statid) {
rc = -ENOMEM;
goto out_nvdimm_events_map;
}
strcpy(statid, stat->stat_id);
p->nvdimm_events_map[count] = statid;
count++;
}
p->nvdimm_events_map[count] = NULL;
kfree(stats);
return 0;
out_nvdimm_events_map:
kfree(p->nvdimm_events_map);
out_stats:
kfree(stats);
return rc;
}
static void papr_scm_pmu_register(struct papr_scm_priv *p)
{
struct nvdimm_pmu *nd_pmu;
int rc, nodeid;
nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
if (!nd_pmu) {
rc = -ENOMEM;
goto pmu_err_print;
}
rc = papr_scm_pmu_check_events(p, nd_pmu);
if (rc)
goto pmu_check_events_err;
nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
nd_pmu->pmu.name = nvdimm_name(p->nvdimm);
nd_pmu->pmu.event_init = papr_scm_pmu_event_init;
nd_pmu->pmu.read = papr_scm_pmu_read;
nd_pmu->pmu.add = papr_scm_pmu_add;
nd_pmu->pmu.del = papr_scm_pmu_del;
nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
PERF_PMU_CAP_NO_EXCLUDE;
/*updating the cpumask variable */
nodeid = numa_map_to_online_node(dev_to_node(&p->pdev->dev));
nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
rc = register_nvdimm_pmu(nd_pmu, p->pdev);
if (rc)
goto pmu_register_err;
/*
* Set archdata.priv value to nvdimm_pmu structure, to handle the
* unregistering of pmu device.
*/
p->pdev->archdata.priv = nd_pmu;
return;
pmu_register_err:
kfree(p->nvdimm_events_map);
pmu_check_events_err:
kfree(nd_pmu);
pmu_err_print:
dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
}
#else
static void papr_scm_pmu_register(struct papr_scm_priv *p) { }
#endif
/*
* Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
* health information.
......@@ -1320,6 +1542,7 @@ static int papr_scm_probe(struct platform_device *pdev)
goto err2;
platform_set_drvdata(pdev, p);
papr_scm_pmu_register(p);
return 0;
......@@ -1338,6 +1561,12 @@ static int papr_scm_remove(struct platform_device *pdev)
nvdimm_bus_unregister(p->bus);
drc_pmem_unbind(p);
if (pdev->archdata.priv)
unregister_nvdimm_pmu(pdev->archdata.priv);
pdev->archdata.priv = NULL;
kfree(p->nvdimm_events_map);
kfree(p->bus_desc.provider_name);
kfree(p);
......
......@@ -999,80 +999,6 @@ static void *add_table(struct acpi_nfit_desc *acpi_desc,
return table + hdr->length;
}
static void nfit_mem_find_spa_bdw(struct acpi_nfit_desc *acpi_desc,
struct nfit_mem *nfit_mem)
{
u32 device_handle = __to_nfit_memdev(nfit_mem)->device_handle;
u16 dcr = nfit_mem->dcr->region_index;
struct nfit_spa *nfit_spa;
list_for_each_entry(nfit_spa, &acpi_desc->spas, list) {
u16 range_index = nfit_spa->spa->range_index;
int type = nfit_spa_type(nfit_spa->spa);
struct nfit_memdev *nfit_memdev;
if (type != NFIT_SPA_BDW)
continue;
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
if (nfit_memdev->memdev->range_index != range_index)
continue;
if (nfit_memdev->memdev->device_handle != device_handle)
continue;
if (nfit_memdev->memdev->region_index != dcr)
continue;
nfit_mem->spa_bdw = nfit_spa->spa;
return;
}
}
dev_dbg(acpi_desc->dev, "SPA-BDW not found for SPA-DCR %d\n",
nfit_mem->spa_dcr->range_index);
nfit_mem->bdw = NULL;
}
static void nfit_mem_init_bdw(struct acpi_nfit_desc *acpi_desc,
struct nfit_mem *nfit_mem, struct acpi_nfit_system_address *spa)
{
u16 dcr = __to_nfit_memdev(nfit_mem)->region_index;
struct nfit_memdev *nfit_memdev;
struct nfit_bdw *nfit_bdw;
struct nfit_idt *nfit_idt;
u16 idt_idx, range_index;
list_for_each_entry(nfit_bdw, &acpi_desc->bdws, list) {
if (nfit_bdw->bdw->region_index != dcr)
continue;
nfit_mem->bdw = nfit_bdw->bdw;
break;
}
if (!nfit_mem->bdw)
return;
nfit_mem_find_spa_bdw(acpi_desc, nfit_mem);
if (!nfit_mem->spa_bdw)
return;
range_index = nfit_mem->spa_bdw->range_index;
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
if (nfit_memdev->memdev->range_index != range_index ||
nfit_memdev->memdev->region_index != dcr)
continue;
nfit_mem->memdev_bdw = nfit_memdev->memdev;
idt_idx = nfit_memdev->memdev->interleave_index;
list_for_each_entry(nfit_idt, &acpi_desc->idts, list) {
if (nfit_idt->idt->interleave_index != idt_idx)
continue;
nfit_mem->idt_bdw = nfit_idt->idt;
break;
}
break;
}
}
static int __nfit_mem_init(struct acpi_nfit_desc *acpi_desc,
struct acpi_nfit_system_address *spa)
{
......@@ -1189,7 +1115,6 @@ static int __nfit_mem_init(struct acpi_nfit_desc *acpi_desc,
nfit_mem->idt_dcr = nfit_idt->idt;
break;
}
nfit_mem_init_bdw(acpi_desc, nfit_mem, spa);
} else if (type == NFIT_SPA_PM) {
/*
* A single dimm may belong to multiple SPA-PM
......@@ -1532,8 +1457,6 @@ static int num_nvdimm_formats(struct nvdimm *nvdimm)
if (nfit_mem->memdev_pmem)
formats++;
if (nfit_mem->memdev_bdw)
formats++;
return formats;
}
......@@ -2079,11 +2002,6 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
continue;
}
if (nfit_mem->bdw && nfit_mem->memdev_pmem) {
set_bit(NDD_ALIASING, &flags);
set_bit(NDD_LABELING, &flags);
}
/* collate flags across all memdevs for this dimm */
list_for_each_entry(nfit_memdev, &acpi_desc->memdevs, list) {
struct acpi_nfit_memory_map *dimm_memdev;
......@@ -2118,10 +2036,6 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc)
cmd_mask |= nfit_mem->dsm_mask & NVDIMM_STANDARD_CMDMASK;
}
/* Quirk to ignore LOCAL for labels on HYPERV DIMMs */
if (nfit_mem->family == NVDIMM_FAMILY_HYPERV)
set_bit(NDD_NOBLK, &flags);
if (test_bit(NFIT_MEM_LSR, &nfit_mem->flags)) {
set_bit(ND_CMD_GET_CONFIG_SIZE, &cmd_mask);
set_bit(ND_CMD_GET_CONFIG_DATA, &cmd_mask);
......@@ -2429,272 +2343,6 @@ static int acpi_nfit_init_interleave_set(struct acpi_nfit_desc *acpi_desc,
return 0;
}
static u64 to_interleave_offset(u64 offset, struct nfit_blk_mmio *mmio)
{
struct acpi_nfit_interleave *idt = mmio->idt;
u32 sub_line_offset, line_index, line_offset;
u64 line_no, table_skip_count, table_offset;
line_no = div_u64_rem(offset, mmio->line_size, &sub_line_offset);
table_skip_count = div_u64_rem(line_no, mmio->num_lines, &line_index);
line_offset = idt->line_offset[line_index]
* mmio->line_size;
table_offset = table_skip_count * mmio->table_size;
return mmio->base_offset + line_offset + table_offset + sub_line_offset;
}
static u32 read_blk_stat(struct nfit_blk *nfit_blk, unsigned int bw)
{
struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
u64 offset = nfit_blk->stat_offset + mmio->size * bw;
const u32 STATUS_MASK = 0x80000037;
if (mmio->num_lines)
offset = to_interleave_offset(offset, mmio);
return readl(mmio->addr.base + offset) & STATUS_MASK;
}
static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
resource_size_t dpa, unsigned int len, unsigned int write)
{
u64 cmd, offset;
struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
enum {
BCW_OFFSET_MASK = (1ULL << 48)-1,
BCW_LEN_SHIFT = 48,
BCW_LEN_MASK = (1ULL << 8) - 1,
BCW_CMD_SHIFT = 56,
};
cmd = (dpa >> L1_CACHE_SHIFT) & BCW_OFFSET_MASK;
len = len >> L1_CACHE_SHIFT;
cmd |= ((u64) len & BCW_LEN_MASK) << BCW_LEN_SHIFT;
cmd |= ((u64) write) << BCW_CMD_SHIFT;
offset = nfit_blk->cmd_offset + mmio->size * bw;
if (mmio->num_lines)
offset = to_interleave_offset(offset, mmio);
writeq(cmd, mmio->addr.base + offset);
nvdimm_flush(nfit_blk->nd_region, NULL);
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
}
static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
resource_size_t dpa, void *iobuf, size_t len, int rw,
unsigned int lane)
{
struct nfit_blk_mmio *mmio = &nfit_blk->mmio[BDW];
unsigned int copied = 0;
u64 base_offset;
int rc;
base_offset = nfit_blk->bdw_offset + dpa % L1_CACHE_BYTES
+ lane * mmio->size;
write_blk_ctl(nfit_blk, lane, dpa, len, rw);
while (len) {
unsigned int c;
u64 offset;
if (mmio->num_lines) {
u32 line_offset;
offset = to_interleave_offset(base_offset + copied,
mmio);
div_u64_rem(offset, mmio->line_size, &line_offset);
c = min_t(size_t, len, mmio->line_size - line_offset);
} else {
offset = base_offset + nfit_blk->bdw_offset;
c = len;
}
if (rw)
memcpy_flushcache(mmio->addr.aperture + offset, iobuf + copied, c);
else {
if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH)
arch_invalidate_pmem((void __force *)
mmio->addr.aperture + offset, c);
memcpy(iobuf + copied, mmio->addr.aperture + offset, c);
}
copied += c;
len -= c;
}
if (rw)
nvdimm_flush(nfit_blk->nd_region, NULL);
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
}
static int acpi_nfit_blk_region_do_io(struct nd_blk_region *ndbr,
resource_size_t dpa, void *iobuf, u64 len, int rw)
{
struct nfit_blk *nfit_blk = nd_blk_region_provider_data(ndbr);
struct nfit_blk_mmio *mmio = &nfit_blk->mmio[BDW];
struct nd_region *nd_region = nfit_blk->nd_region;
unsigned int lane, copied = 0;
int rc = 0;
lane = nd_region_acquire_lane(nd_region);
while (len) {
u64 c = min(len, mmio->size);
rc = acpi_nfit_blk_single_io(nfit_blk, dpa + copied,
iobuf + copied, c, rw, lane);
if (rc)
break;
copied += c;
len -= c;
}
nd_region_release_lane(nd_region, lane);
return rc;
}
static int nfit_blk_init_interleave(struct nfit_blk_mmio *mmio,
struct acpi_nfit_interleave *idt, u16 interleave_ways)
{
if (idt) {
mmio->num_lines = idt->line_count;
mmio->line_size = idt->line_size;
if (interleave_ways == 0)
return -ENXIO;
mmio->table_size = mmio->num_lines * interleave_ways
* mmio->line_size;
}
return 0;
}
static int acpi_nfit_blk_get_flags(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, struct nfit_blk *nfit_blk)
{
struct nd_cmd_dimm_flags flags;
int rc;
memset(&flags, 0, sizeof(flags));
rc = nd_desc->ndctl(nd_desc, nvdimm, ND_CMD_DIMM_FLAGS, &flags,
sizeof(flags), NULL);
if (rc >= 0 && flags.status == 0)
nfit_blk->dimm_flags = flags.flags;
else if (rc == -ENOTTY) {
/* fall back to a conservative default */
nfit_blk->dimm_flags = NFIT_BLK_DCR_LATCH | NFIT_BLK_READ_FLUSH;
rc = 0;
} else
rc = -ENXIO;
return rc;
}
static int acpi_nfit_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
struct device *dev)
{
struct nvdimm_bus_descriptor *nd_desc = to_nd_desc(nvdimm_bus);
struct nd_blk_region *ndbr = to_nd_blk_region(dev);
struct nfit_blk_mmio *mmio;
struct nfit_blk *nfit_blk;
struct nfit_mem *nfit_mem;
struct nvdimm *nvdimm;
int rc;
nvdimm = nd_blk_region_to_dimm(ndbr);
nfit_mem = nvdimm_provider_data(nvdimm);
if (!nfit_mem || !nfit_mem->dcr || !nfit_mem->bdw) {
dev_dbg(dev, "missing%s%s%s\n",
nfit_mem ? "" : " nfit_mem",
(nfit_mem && nfit_mem->dcr) ? "" : " dcr",
(nfit_mem && nfit_mem->bdw) ? "" : " bdw");
return -ENXIO;
}
nfit_blk = devm_kzalloc(dev, sizeof(*nfit_blk), GFP_KERNEL);
if (!nfit_blk)
return -ENOMEM;
nd_blk_region_set_provider_data(ndbr, nfit_blk);
nfit_blk->nd_region = to_nd_region(dev);
/* map block aperture memory */
nfit_blk->bdw_offset = nfit_mem->bdw->offset;
mmio = &nfit_blk->mmio[BDW];
mmio->addr.base = devm_nvdimm_memremap(dev, nfit_mem->spa_bdw->address,
nfit_mem->spa_bdw->length, nd_blk_memremap_flags(ndbr));
if (!mmio->addr.base) {
dev_dbg(dev, "%s failed to map bdw\n",
nvdimm_name(nvdimm));
return -ENOMEM;
}
mmio->size = nfit_mem->bdw->size;
mmio->base_offset = nfit_mem->memdev_bdw->region_offset;
mmio->idt = nfit_mem->idt_bdw;
mmio->spa = nfit_mem->spa_bdw;
rc = nfit_blk_init_interleave(mmio, nfit_mem->idt_bdw,
nfit_mem->memdev_bdw->interleave_ways);
if (rc) {
dev_dbg(dev, "%s failed to init bdw interleave\n",
nvdimm_name(nvdimm));
return rc;
}
/* map block control memory */
nfit_blk->cmd_offset = nfit_mem->dcr->command_offset;
nfit_blk->stat_offset = nfit_mem->dcr->status_offset;
mmio = &nfit_blk->mmio[DCR];
mmio->addr.base = devm_nvdimm_ioremap(dev, nfit_mem->spa_dcr->address,
nfit_mem->spa_dcr->length);
if (!mmio->addr.base) {
dev_dbg(dev, "%s failed to map dcr\n",
nvdimm_name(nvdimm));
return -ENOMEM;
}
mmio->size = nfit_mem->dcr->window_size;
mmio->base_offset = nfit_mem->memdev_dcr->region_offset;
mmio->idt = nfit_mem->idt_dcr;
mmio->spa = nfit_mem->spa_dcr;
rc = nfit_blk_init_interleave(mmio, nfit_mem->idt_dcr,
nfit_mem->memdev_dcr->interleave_ways);
if (rc) {
dev_dbg(dev, "%s failed to init dcr interleave\n",
nvdimm_name(nvdimm));
return rc;
}
rc = acpi_nfit_blk_get_flags(nd_desc, nvdimm, nfit_blk);
if (rc < 0) {
dev_dbg(dev, "%s failed get DIMM flags\n",
nvdimm_name(nvdimm));
return rc;
}
if (nvdimm_has_flush(nfit_blk->nd_region) < 0)
dev_warn(dev, "unable to guarantee persistence of writes\n");
if (mmio->line_size == 0)
return 0;
if ((u32) nfit_blk->cmd_offset % mmio->line_size
+ 8 > mmio->line_size) {
dev_dbg(dev, "cmd_offset crosses interleave boundary\n");
return -ENXIO;
} else if ((u32) nfit_blk->stat_offset % mmio->line_size
+ 8 > mmio->line_size) {
dev_dbg(dev, "stat_offset crosses interleave boundary\n");
return -ENXIO;
}
return 0;
}
static int ars_get_cap(struct acpi_nfit_desc *acpi_desc,
struct nd_cmd_ars_cap *cmd, struct nfit_spa *nfit_spa)
{
......@@ -2911,9 +2559,6 @@ static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
struct nvdimm *nvdimm = acpi_nfit_dimm_by_handle(acpi_desc,
memdev->device_handle);
struct acpi_nfit_system_address *spa = nfit_spa->spa;
struct nd_blk_region_desc *ndbr_desc;
struct nfit_mem *nfit_mem;
int rc;
if (!nvdimm) {
dev_err(acpi_desc->dev, "spa%d dimm: %#x not found\n",
......@@ -2928,30 +2573,6 @@ static int acpi_nfit_init_mapping(struct acpi_nfit_desc *acpi_desc,
mapping->start = memdev->address;
mapping->size = memdev->region_size;
break;
case NFIT_SPA_DCR:
nfit_mem = nvdimm_provider_data(nvdimm);
if (!nfit_mem || !nfit_mem->bdw) {
dev_dbg(acpi_desc->dev, "spa%d %s missing bdw\n",
spa->range_index, nvdimm_name(nvdimm));
break;
}
mapping->size = nfit_mem->bdw->capacity;
mapping->start = nfit_mem->bdw->start_address;
ndr_desc->num_lanes = nfit_mem->bdw->windows;
ndr_desc->mapping = mapping;
ndr_desc->num_mappings = 1;
ndbr_desc = to_blk_region_desc(ndr_desc);
ndbr_desc->enable = acpi_nfit_blk_region_enable;
ndbr_desc->do_io = acpi_desc->blk_do_io;
rc = acpi_nfit_init_interleave_set(acpi_desc, ndr_desc, spa);
if (rc)
return rc;
nfit_spa->nd_region = nvdimm_blk_region_create(acpi_desc->nvdimm_bus,
ndr_desc);
if (!nfit_spa->nd_region)
return -ENOMEM;
break;
}
return 0;
......@@ -2977,8 +2598,7 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
{
static struct nd_mapping_desc mappings[ND_MAX_MAPPINGS];
struct acpi_nfit_system_address *spa = nfit_spa->spa;
struct nd_blk_region_desc ndbr_desc;
struct nd_region_desc *ndr_desc;
struct nd_region_desc *ndr_desc, _ndr_desc;
struct nfit_memdev *nfit_memdev;
struct nvdimm_bus *nvdimm_bus;
struct resource res;
......@@ -2994,10 +2614,10 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
memset(&res, 0, sizeof(res));
memset(&mappings, 0, sizeof(mappings));
memset(&ndbr_desc, 0, sizeof(ndbr_desc));
memset(&_ndr_desc, 0, sizeof(_ndr_desc));
res.start = spa->address;
res.end = res.start + spa->length - 1;
ndr_desc = &ndbr_desc.ndr_desc;
ndr_desc = &_ndr_desc;
ndr_desc->res = &res;
ndr_desc->provider_data = nfit_spa;
ndr_desc->attr_groups = acpi_nfit_region_attribute_groups;
......@@ -3635,7 +3255,6 @@ void acpi_nfit_desc_init(struct acpi_nfit_desc *acpi_desc, struct device *dev)
dev_set_drvdata(dev, acpi_desc);
acpi_desc->dev = dev;
acpi_desc->blk_do_io = acpi_nfit_blk_region_do_io;
nd_desc = &acpi_desc->nd_desc;
nd_desc->provider_name = "ACPI.NFIT";
nd_desc->module = THIS_MODULE;
......
......@@ -208,13 +208,9 @@ struct nfit_mem {
struct nvdimm *nvdimm;
struct acpi_nfit_memory_map *memdev_dcr;
struct acpi_nfit_memory_map *memdev_pmem;
struct acpi_nfit_memory_map *memdev_bdw;
struct acpi_nfit_control_region *dcr;
struct acpi_nfit_data_region *bdw;
struct acpi_nfit_system_address *spa_dcr;
struct acpi_nfit_system_address *spa_bdw;
struct acpi_nfit_interleave *idt_dcr;
struct acpi_nfit_interleave *idt_bdw;
struct kernfs_node *flags_attr;
struct nfit_flush *nfit_flush;
struct list_head list;
......@@ -266,8 +262,6 @@ struct acpi_nfit_desc {
unsigned long family_dsm_mask[NVDIMM_BUS_FAMILY_MAX + 1];
unsigned int platform_cap;
unsigned int scrub_tmo;
int (*blk_do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw);
enum nvdimm_fwa_state fwa_state;
enum nvdimm_fwa_capability fwa_cap;
int fwa_count;
......
......@@ -10,12 +10,9 @@ menuconfig LIBNVDIMM
ACPI-6-NFIT defined resources. On platforms that define an
NFIT, or otherwise can discover NVDIMM resources, a libnvdimm
bus is registered to advertise PMEM (persistent memory)
namespaces (/dev/pmemX) and BLK (sliding mmio window(s))
namespaces (/dev/ndblkX.Y). A PMEM namespace refers to a
namespaces (/dev/pmemX). A PMEM namespace refers to a
memory resource that may span multiple DIMMs and support DAX
(see CONFIG_DAX). A BLK namespace refers to an NVDIMM control
region which exposes an mmio register set for windowed access
mode to non-volatile memory.
(see CONFIG_DAX).
if LIBNVDIMM
......@@ -38,19 +35,6 @@ config BLK_DEV_PMEM
Say Y if you want to use an NVDIMM
config ND_BLK
tristate "BLK: Block data window (aperture) device support"
default LIBNVDIMM
select ND_BTT if BTT
help
Support NVDIMMs, or other devices, that implement a BLK-mode
access capability. BLK-mode access uses memory-mapped-i/o
apertures to access persistent media.
Say Y if your platform firmware emits an ACPI.NFIT table
(CONFIG_ACPI_NFIT), or otherwise exposes BLK-mode
capabilities.
config ND_CLAIM
bool
......@@ -67,9 +51,8 @@ config BTT
applications that rely on sector writes not being torn (a
guarantee that typical disks provide) can continue to do so.
The BTT manifests itself as an alternate personality for an
NVDIMM namespace, i.e. a namespace can be in raw mode (pmemX,
ndblkX.Y, etc...), or 'sectored' mode, (pmemXs, ndblkX.Ys,
etc...).
NVDIMM namespace, i.e. a namespace can be in raw mode pmemX,
or 'sectored' mode.
Select Y if unsure
......
......@@ -2,7 +2,6 @@
obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
obj-$(CONFIG_ND_BTT) += nd_btt.o
obj-$(CONFIG_ND_BLK) += nd_blk.o
obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
obj-$(CONFIG_OF_PMEM) += of_pmem.o
obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o nd_virtio.o
......@@ -11,13 +10,12 @@ nd_pmem-y := pmem.o
nd_btt-y := btt.o
nd_blk-y := blk.o
nd_e820-y := e820.o
libnvdimm-y := core.o
libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
libnvdimm-$(CONFIG_PERF_EVENTS) += nd_perf.o
libnvdimm-y += dimm.o
libnvdimm-y += region_devs.o
libnvdimm-y += region.o
......
// SPDX-License-Identifier: GPL-2.0-only
/*
* NVDIMM Block Window Driver
* Copyright (c) 2014, Intel Corporation.
*/
#include <linux/blkdev.h>
#include <linux/fs.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/nd.h>
#include <linux/sizes.h>
#include "nd.h"
static u32 nsblk_meta_size(struct nd_namespace_blk *nsblk)
{
return nsblk->lbasize - ((nsblk->lbasize >= 4096) ? 4096 : 512);
}
static u32 nsblk_internal_lbasize(struct nd_namespace_blk *nsblk)
{
return roundup(nsblk->lbasize, INT_LBASIZE_ALIGNMENT);
}
static u32 nsblk_sector_size(struct nd_namespace_blk *nsblk)
{
return nsblk->lbasize - nsblk_meta_size(nsblk);
}
static resource_size_t to_dev_offset(struct nd_namespace_blk *nsblk,
resource_size_t ns_offset, unsigned int len)
{
int i;
for (i = 0; i < nsblk->num_resources; i++) {
if (ns_offset < resource_size(nsblk->res[i])) {
if (ns_offset + len > resource_size(nsblk->res[i])) {
dev_WARN_ONCE(&nsblk->common.dev, 1,
"illegal request\n");
return SIZE_MAX;
}
return nsblk->res[i]->start + ns_offset;
}
ns_offset -= resource_size(nsblk->res[i]);
}
dev_WARN_ONCE(&nsblk->common.dev, 1, "request out of range\n");
return SIZE_MAX;
}
static struct nd_blk_region *to_ndbr(struct nd_namespace_blk *nsblk)
{
struct nd_region *nd_region;
struct device *parent;
parent = nsblk->common.dev.parent;
nd_region = container_of(parent, struct nd_region, dev);
return container_of(nd_region, struct nd_blk_region, nd_region);
}
#ifdef CONFIG_BLK_DEV_INTEGRITY
static int nd_blk_rw_integrity(struct nd_namespace_blk *nsblk,
struct bio_integrity_payload *bip, u64 lba, int rw)
{
struct nd_blk_region *ndbr = to_ndbr(nsblk);
unsigned int len = nsblk_meta_size(nsblk);
resource_size_t dev_offset, ns_offset;
u32 internal_lbasize, sector_size;
int err = 0;
internal_lbasize = nsblk_internal_lbasize(nsblk);
sector_size = nsblk_sector_size(nsblk);
ns_offset = lba * internal_lbasize + sector_size;
dev_offset = to_dev_offset(nsblk, ns_offset, len);
if (dev_offset == SIZE_MAX)
return -EIO;
while (len) {
unsigned int cur_len;
struct bio_vec bv;
void *iobuf;
bv = bvec_iter_bvec(bip->bip_vec, bip->bip_iter);
/*
* The 'bv' obtained from bvec_iter_bvec has its .bv_len and
* .bv_offset already adjusted for iter->bi_bvec_done, and we
* can use those directly
*/
cur_len = min(len, bv.bv_len);
iobuf = bvec_kmap_local(&bv);
err = ndbr->do_io(ndbr, dev_offset, iobuf, cur_len, rw);
kunmap_local(iobuf);
if (err)
return err;
len -= cur_len;
dev_offset += cur_len;
if (!bvec_iter_advance(bip->bip_vec, &bip->bip_iter, cur_len))
return -EIO;
}
return err;
}
#else /* CONFIG_BLK_DEV_INTEGRITY */
static int nd_blk_rw_integrity(struct nd_namespace_blk *nsblk,
struct bio_integrity_payload *bip, u64 lba, int rw)
{
return 0;
}
#endif
static int nsblk_do_bvec(struct nd_namespace_blk *nsblk,
struct bio_integrity_payload *bip, struct page *page,
unsigned int len, unsigned int off, int rw, sector_t sector)
{
struct nd_blk_region *ndbr = to_ndbr(nsblk);
resource_size_t dev_offset, ns_offset;
u32 internal_lbasize, sector_size;
int err = 0;
void *iobuf;
u64 lba;
internal_lbasize = nsblk_internal_lbasize(nsblk);
sector_size = nsblk_sector_size(nsblk);
while (len) {
unsigned int cur_len;
/*
* If we don't have an integrity payload, we don't have to
* split the bvec into sectors, as this would cause unnecessary
* Block Window setup/move steps. the do_io routine is capable
* of handling len <= PAGE_SIZE.
*/
cur_len = bip ? min(len, sector_size) : len;
lba = div_u64(sector << SECTOR_SHIFT, sector_size);
ns_offset = lba * internal_lbasize;
dev_offset = to_dev_offset(nsblk, ns_offset, cur_len);
if (dev_offset == SIZE_MAX)
return -EIO;
iobuf = kmap_atomic(page);
err = ndbr->do_io(ndbr, dev_offset, iobuf + off, cur_len, rw);
kunmap_atomic(iobuf);
if (err)
return err;
if (bip) {
err = nd_blk_rw_integrity(nsblk, bip, lba, rw);
if (err)
return err;
}
len -= cur_len;
off += cur_len;
sector += sector_size >> SECTOR_SHIFT;
}
return err;
}
static void nd_blk_submit_bio(struct bio *bio)
{
struct bio_integrity_payload *bip;
struct nd_namespace_blk *nsblk = bio->bi_bdev->bd_disk->private_data;
struct bvec_iter iter;
unsigned long start;
struct bio_vec bvec;
int err = 0, rw;
bool do_acct;
if (!bio_integrity_prep(bio))
return;
bip = bio_integrity(bio);
rw = bio_data_dir(bio);
do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
if (do_acct)
start = bio_start_io_acct(bio);
bio_for_each_segment(bvec, bio, iter) {
unsigned int len = bvec.bv_len;
BUG_ON(len > PAGE_SIZE);
err = nsblk_do_bvec(nsblk, bip, bvec.bv_page, len,
bvec.bv_offset, rw, iter.bi_sector);
if (err) {
dev_dbg(&nsblk->common.dev,
"io error in %s sector %lld, len %d,\n",
(rw == READ) ? "READ" : "WRITE",
(unsigned long long) iter.bi_sector, len);
bio->bi_status = errno_to_blk_status(err);
break;
}
}
if (do_acct)
bio_end_io_acct(bio, start);
bio_endio(bio);
}
static int nsblk_rw_bytes(struct nd_namespace_common *ndns,
resource_size_t offset, void *iobuf, size_t n, int rw,
unsigned long flags)
{
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(&ndns->dev);
struct nd_blk_region *ndbr = to_ndbr(nsblk);
resource_size_t dev_offset;
dev_offset = to_dev_offset(nsblk, offset, n);
if (unlikely(offset + n > nsblk->size)) {
dev_WARN_ONCE(&ndns->dev, 1, "request out of range\n");
return -EFAULT;
}
if (dev_offset == SIZE_MAX)
return -EIO;
return ndbr->do_io(ndbr, dev_offset, iobuf, n, rw);
}
static const struct block_device_operations nd_blk_fops = {
.owner = THIS_MODULE,
.submit_bio = nd_blk_submit_bio,
};
static void nd_blk_release_disk(void *disk)
{
del_gendisk(disk);
blk_cleanup_disk(disk);
}
static int nsblk_attach_disk(struct nd_namespace_blk *nsblk)
{
struct device *dev = &nsblk->common.dev;
resource_size_t available_disk_size;
struct gendisk *disk;
u64 internal_nlba;
int rc;
internal_nlba = div_u64(nsblk->size, nsblk_internal_lbasize(nsblk));
available_disk_size = internal_nlba * nsblk_sector_size(nsblk);
disk = blk_alloc_disk(NUMA_NO_NODE);
if (!disk)
return -ENOMEM;
disk->fops = &nd_blk_fops;
disk->private_data = nsblk;
nvdimm_namespace_disk_name(&nsblk->common, disk->disk_name);
blk_queue_max_hw_sectors(disk->queue, UINT_MAX);
blk_queue_logical_block_size(disk->queue, nsblk_sector_size(nsblk));
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
if (nsblk_meta_size(nsblk)) {
rc = nd_integrity_init(disk, nsblk_meta_size(nsblk));
if (rc)
goto out_before_devm_err;
}
set_capacity(disk, available_disk_size >> SECTOR_SHIFT);
rc = device_add_disk(dev, disk, NULL);
if (rc)
goto out_before_devm_err;
/* nd_blk_release_disk() is called if this fails */
if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
return -ENOMEM;
nvdimm_check_and_set_ro(disk);
return 0;
out_before_devm_err:
blk_cleanup_disk(disk);
return rc;
}
static int nd_blk_probe(struct device *dev)
{
struct nd_namespace_common *ndns;
struct nd_namespace_blk *nsblk;
ndns = nvdimm_namespace_common_probe(dev);
if (IS_ERR(ndns))
return PTR_ERR(ndns);
nsblk = to_nd_namespace_blk(&ndns->dev);
nsblk->size = nvdimm_namespace_capacity(ndns);
dev_set_drvdata(dev, nsblk);
ndns->rw_bytes = nsblk_rw_bytes;
if (is_nd_btt(dev))
return nvdimm_namespace_attach_btt(ndns);
else if (nd_btt_probe(dev, ndns) == 0) {
/* we'll come back as btt-blk */
return -ENXIO;
} else
return nsblk_attach_disk(nsblk);
}
static void nd_blk_remove(struct device *dev)
{
if (is_nd_btt(dev))
nvdimm_namespace_detach_btt(to_nd_btt(dev));
}
static struct nd_device_driver nd_blk_driver = {
.probe = nd_blk_probe,
.remove = nd_blk_remove,
.drv = {
.name = "nd_blk",
},
.type = ND_DRIVER_NAMESPACE_BLK,
};
static int __init nd_blk_init(void)
{
return nd_driver_register(&nd_blk_driver);
}
static void __exit nd_blk_exit(void)
{
driver_unregister(&nd_blk_driver.drv);
}
MODULE_AUTHOR("Ross Zwisler <ross.zwisler@linux.intel.com>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_BLK);
module_init(nd_blk_init);
module_exit(nd_blk_exit);
......@@ -34,8 +34,6 @@ static int to_nd_device_type(struct device *dev)
return ND_DEVICE_DIMM;
else if (is_memory(dev))
return ND_DEVICE_REGION_PMEM;
else if (is_nd_blk(dev))
return ND_DEVICE_REGION_BLK;
else if (is_nd_dax(dev))
return ND_DEVICE_DAX_PMEM;
else if (is_nd_region(dev->parent))
......
......@@ -18,10 +18,6 @@
static DEFINE_IDA(dimm_ida);
static bool noblk;
module_param(noblk, bool, 0444);
MODULE_PARM_DESC(noblk, "force disable BLK / local alias support");
/*
* Retrieve bus and dimm handle and return if this bus supports
* get_config_data commands
......@@ -211,22 +207,6 @@ struct nvdimm *to_nvdimm(struct device *dev)
}
EXPORT_SYMBOL_GPL(to_nvdimm);
struct nvdimm *nd_blk_region_to_dimm(struct nd_blk_region *ndbr)
{
struct nd_region *nd_region = &ndbr->nd_region;
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
return nd_mapping->nvdimm;
}
EXPORT_SYMBOL_GPL(nd_blk_region_to_dimm);
unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr)
{
/* pmem mapping properties are private to libnvdimm */
return ARCH_MEMREMAP_PMEM;
}
EXPORT_SYMBOL_GPL(nd_blk_memremap_flags);
struct nvdimm_drvdata *to_ndd(struct nd_mapping *nd_mapping)
{
struct nvdimm *nvdimm = nd_mapping->nvdimm;
......@@ -312,8 +292,7 @@ static ssize_t flags_show(struct device *dev,
{
struct nvdimm *nvdimm = to_nvdimm(dev);
return sprintf(buf, "%s%s%s\n",
test_bit(NDD_ALIASING, &nvdimm->flags) ? "alias " : "",
return sprintf(buf, "%s%s\n",
test_bit(NDD_LABELING, &nvdimm->flags) ? "label " : "",
test_bit(NDD_LOCKED, &nvdimm->flags) ? "lock " : "");
}
......@@ -612,8 +591,6 @@ struct nvdimm *__nvdimm_create(struct nvdimm_bus *nvdimm_bus,
nvdimm->dimm_id = dimm_id;
nvdimm->provider_data = provider_data;
if (noblk)
flags |= 1 << NDD_NOBLK;
nvdimm->flags = flags;
nvdimm->cmd_mask = cmd_mask;
nvdimm->num_flush = num_flush;
......@@ -726,133 +703,6 @@ static unsigned long dpa_align(struct nd_region *nd_region)
return nd_region->align / nd_region->ndr_mappings;
}
int alias_dpa_busy(struct device *dev, void *data)
{
resource_size_t map_end, blk_start, new;
struct blk_alloc_info *info = data;
struct nd_mapping *nd_mapping;
struct nd_region *nd_region;
struct nvdimm_drvdata *ndd;
struct resource *res;
unsigned long align;
int i;
if (!is_memory(dev))
return 0;
nd_region = to_nd_region(dev);
for (i = 0; i < nd_region->ndr_mappings; i++) {
nd_mapping = &nd_region->mapping[i];
if (nd_mapping->nvdimm == info->nd_mapping->nvdimm)
break;
}
if (i >= nd_region->ndr_mappings)
return 0;
ndd = to_ndd(nd_mapping);
map_end = nd_mapping->start + nd_mapping->size - 1;
blk_start = nd_mapping->start;
/*
* In the allocation case ->res is set to free space that we are
* looking to validate against PMEM aliasing collision rules
* (i.e. BLK is allocated after all aliased PMEM).
*/
if (info->res) {
if (info->res->start >= nd_mapping->start
&& info->res->start < map_end)
/* pass */;
else
return 0;
}
retry:
/*
* Find the free dpa from the end of the last pmem allocation to
* the end of the interleave-set mapping.
*/
align = dpa_align(nd_region);
if (!align)
return 0;
for_each_dpa_resource(ndd, res) {
resource_size_t start, end;
if (strncmp(res->name, "pmem", 4) != 0)
continue;
start = ALIGN_DOWN(res->start, align);
end = ALIGN(res->end + 1, align) - 1;
if ((start >= blk_start && start < map_end)
|| (end >= blk_start && end <= map_end)) {
new = max(blk_start, min(map_end, end) + 1);
if (new != blk_start) {
blk_start = new;
goto retry;
}
}
}
/* update the free space range with the probed blk_start */
if (info->res && blk_start > info->res->start) {
info->res->start = max(info->res->start, blk_start);
if (info->res->start > info->res->end)
info->res->end = info->res->start - 1;
return 1;
}
info->available -= blk_start - nd_mapping->start;
return 0;
}
/**
* nd_blk_available_dpa - account the unused dpa of BLK region
* @nd_mapping: container of dpa-resource-root + labels
*
* Unlike PMEM, BLK namespaces can occupy discontiguous DPA ranges, but
* we arrange for them to never start at an lower dpa than the last
* PMEM allocation in an aliased region.
*/
resource_size_t nd_blk_available_dpa(struct nd_region *nd_region)
{
struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev);
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct blk_alloc_info info = {
.nd_mapping = nd_mapping,
.available = nd_mapping->size,
.res = NULL,
};
struct resource *res;
unsigned long align;
if (!ndd)
return 0;
device_for_each_child(&nvdimm_bus->dev, &info, alias_dpa_busy);
/* now account for busy blk allocations in unaliased dpa */
align = dpa_align(nd_region);
if (!align)
return 0;
for_each_dpa_resource(ndd, res) {
resource_size_t start, end, size;
if (strncmp(res->name, "blk", 3) != 0)
continue;
start = ALIGN_DOWN(res->start, align);
end = ALIGN(res->end + 1, align) - 1;
size = end - start + 1;
if (size >= info.available)
return 0;
info.available -= size;
}
return info.available;
}
/**
* nd_pmem_max_contiguous_dpa - For the given dimm+region, return the max
* contiguous unallocated dpa range.
......@@ -900,24 +750,16 @@ resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region,
* nd_pmem_available_dpa - for the given dimm+region account unallocated dpa
* @nd_mapping: container of dpa-resource-root + labels
* @nd_region: constrain available space check to this reference region
* @overlap: calculate available space assuming this level of overlap
*
* Validate that a PMEM label, if present, aligns with the start of an
* interleave set and truncate the available size at the lowest BLK
* overlap point.
*
* The expectation is that this routine is called multiple times as it
* probes for the largest BLK encroachment for any single member DIMM of
* the interleave set. Once that value is determined the PMEM-limit for
* the set can be established.
* interleave set.
*/
resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, resource_size_t *overlap)
struct nd_mapping *nd_mapping)
{
resource_size_t map_start, map_end, busy = 0, available, blk_start;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
resource_size_t map_start, map_end, busy = 0;
struct resource *res;
const char *reason;
unsigned long align;
if (!ndd)
......@@ -929,46 +771,28 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
map_start = nd_mapping->start;
map_end = map_start + nd_mapping->size - 1;
blk_start = max(map_start, map_end + 1 - *overlap);
for_each_dpa_resource(ndd, res) {
resource_size_t start, end;
start = ALIGN_DOWN(res->start, align);
end = ALIGN(res->end + 1, align) - 1;
if (start >= map_start && start < map_end) {
if (strncmp(res->name, "blk", 3) == 0)
blk_start = min(blk_start,
max(map_start, start));
else if (end > map_end) {
reason = "misaligned to iset";
goto err;
} else
busy += end - start + 1;
if (end > map_end) {
nd_dbg_dpa(nd_region, ndd, res,
"misaligned to iset\n");
return 0;
}
busy += end - start + 1;
} else if (end >= map_start && end <= map_end) {
if (strncmp(res->name, "blk", 3) == 0) {
/*
* If a BLK allocation overlaps the start of
* PMEM the entire interleave set may now only
* be used for BLK.
*/
blk_start = map_start;
} else
busy += end - start + 1;
busy += end - start + 1;
} else if (map_start > start && map_start < end) {
/* total eclipse of the mapping */
busy += nd_mapping->size;
blk_start = map_start;
}
}
*overlap = map_end + 1 - blk_start;
available = blk_start - map_start;
if (busy < available)
return ALIGN_DOWN(available - busy, align);
return 0;
err:
nd_dbg_dpa(nd_region, ndd, res, "%s\n", reason);
if (busy < nd_mapping->size)
return ALIGN_DOWN(nd_mapping->size - busy, align);
return 0;
}
......@@ -999,7 +823,7 @@ struct resource *nvdimm_allocate_dpa(struct nvdimm_drvdata *ndd,
/**
* nvdimm_allocated_dpa - sum up the dpa currently allocated to this label_id
* @nvdimm: container of dpa-resource-root + labels
* @label_id: dpa resource name of the form {pmem|blk}-<human readable uuid>
* @label_id: dpa resource name of the form pmem-<human readable uuid>
*/
resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
struct nd_label_id *label_id)
......
......@@ -334,8 +334,7 @@ char *nd_label_gen_id(struct nd_label_id *label_id, const uuid_t *uuid,
{
if (!label_id || !uuid)
return NULL;
snprintf(label_id->id, ND_LABEL_ID_SIZE, "%s-%pUb",
flags & NSLABEL_FLAG_LOCAL ? "blk" : "pmem", uuid);
snprintf(label_id->id, ND_LABEL_ID_SIZE, "pmem-%pUb", uuid);
return label_id->id;
}
......@@ -406,7 +405,6 @@ int nd_label_reserve_dpa(struct nvdimm_drvdata *ndd)
return 0; /* no label, nothing to reserve */
for_each_clear_bit_le(slot, free, nslot) {
struct nvdimm *nvdimm = to_nvdimm(ndd->dev);
struct nd_namespace_label *nd_label;
struct nd_region *nd_region = NULL;
struct nd_label_id label_id;
......@@ -421,8 +419,6 @@ int nd_label_reserve_dpa(struct nvdimm_drvdata *ndd)
nsl_get_uuid(ndd, nd_label, &label_uuid);
flags = nsl_get_flags(ndd, nd_label);
if (test_bit(NDD_NOBLK, &nvdimm->flags))
flags &= ~NSLABEL_FLAG_LOCAL;
nd_label_gen_id(&label_id, &label_uuid, flags);
res = nvdimm_allocate_dpa(ndd, &label_id,
nsl_get_dpa(ndd, nd_label),
......@@ -968,326 +964,6 @@ static int __pmem_label_update(struct nd_region *nd_region,
return rc;
}
static bool is_old_resource(struct resource *res, struct resource **list, int n)
{
int i;
if (res->flags & DPA_RESOURCE_ADJUSTED)
return false;
for (i = 0; i < n; i++)
if (res == list[i])
return true;
return false;
}
static struct resource *to_resource(struct nvdimm_drvdata *ndd,
struct nd_namespace_label *nd_label)
{
struct resource *res;
for_each_dpa_resource(ndd, res) {
if (res->start != nsl_get_dpa(ndd, nd_label))
continue;
if (resource_size(res) != nsl_get_rawsize(ndd, nd_label))
continue;
return res;
}
return NULL;
}
/*
* Use the presence of the type_guid as a flag to determine isetcookie
* usage and nlabel + position policy for blk-aperture namespaces.
*/
static void nsl_set_blk_isetcookie(struct nvdimm_drvdata *ndd,
struct nd_namespace_label *nd_label,
u64 isetcookie)
{
if (efi_namespace_label_has(ndd, type_guid)) {
nsl_set_isetcookie(ndd, nd_label, isetcookie);
return;
}
nsl_set_isetcookie(ndd, nd_label, 0); /* N/A */
}
bool nsl_validate_blk_isetcookie(struct nvdimm_drvdata *ndd,
struct nd_namespace_label *nd_label,
u64 isetcookie)
{
if (!efi_namespace_label_has(ndd, type_guid))
return true;
if (nsl_get_isetcookie(ndd, nd_label) != isetcookie) {
dev_dbg(ndd->dev, "expect cookie %#llx got %#llx\n", isetcookie,
nsl_get_isetcookie(ndd, nd_label));
return false;
}
return true;
}
static void nsl_set_blk_nlabel(struct nvdimm_drvdata *ndd,
struct nd_namespace_label *nd_label, int nlabel,
bool first)
{
if (!efi_namespace_label_has(ndd, type_guid)) {
nsl_set_nlabel(ndd, nd_label, 0); /* N/A */
return;
}
nsl_set_nlabel(ndd, nd_label, first ? nlabel : 0xffff);
}
static void nsl_set_blk_position(struct nvdimm_drvdata *ndd,
struct nd_namespace_label *nd_label,
bool first)
{
if (!efi_namespace_label_has(ndd, type_guid)) {
nsl_set_position(ndd, nd_label, 0);
return;
}
nsl_set_position(ndd, nd_label, first ? 0 : 0xffff);
}
/*
* 1/ Account all the labels that can be freed after this update
* 2/ Allocate and write the label to the staging (next) index
* 3/ Record the resources in the namespace device
*/
static int __blk_label_update(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, struct nd_namespace_blk *nsblk,
int num_labels)
{
int i, alloc, victims, nfree, old_num_resources, nlabel, rc = -ENXIO;
struct nd_interleave_set *nd_set = nd_region->nd_set;
struct nd_namespace_common *ndns = &nsblk->common;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct nd_namespace_label *nd_label;
struct nd_label_ent *label_ent, *e;
struct nd_namespace_index *nsindex;
unsigned long *free, *victim_map = NULL;
struct resource *res, **old_res_list;
struct nd_label_id label_id;
int min_dpa_idx = 0;
LIST_HEAD(list);
u32 nslot, slot;
if (!preamble_next(ndd, &nsindex, &free, &nslot))
return -ENXIO;
old_res_list = nsblk->res;
nfree = nd_label_nfree(ndd);
old_num_resources = nsblk->num_resources;
nd_label_gen_id(&label_id, nsblk->uuid, NSLABEL_FLAG_LOCAL);
/*
* We need to loop over the old resources a few times, which seems a
* bit inefficient, but we need to know that we have the label
* space before we start mutating the tracking structures.
* Otherwise the recovery method of last resort for userspace is
* disable and re-enable the parent region.
*/
alloc = 0;
for_each_dpa_resource(ndd, res) {
if (strcmp(res->name, label_id.id) != 0)
continue;
if (!is_old_resource(res, old_res_list, old_num_resources))
alloc++;
}
victims = 0;
if (old_num_resources) {
/* convert old local-label-map to dimm-slot victim-map */
victim_map = bitmap_zalloc(nslot, GFP_KERNEL);
if (!victim_map)
return -ENOMEM;
/* mark unused labels for garbage collection */
for_each_clear_bit_le(slot, free, nslot) {
nd_label = to_label(ndd, slot);
if (!nsl_uuid_equal(ndd, nd_label, nsblk->uuid))
continue;
res = to_resource(ndd, nd_label);
if (res && is_old_resource(res, old_res_list,
old_num_resources))
continue;
slot = to_slot(ndd, nd_label);
set_bit(slot, victim_map);
victims++;
}
}
/* don't allow updates that consume the last label */
if (nfree - alloc < 0 || nfree - alloc + victims < 1) {
dev_info(&nsblk->common.dev, "insufficient label space\n");
bitmap_free(victim_map);
return -ENOSPC;
}
/* from here on we need to abort on error */
/* assign all resources to the namespace before writing the labels */
nsblk->res = NULL;
nsblk->num_resources = 0;
for_each_dpa_resource(ndd, res) {
if (strcmp(res->name, label_id.id) != 0)
continue;
if (!nsblk_add_resource(nd_region, ndd, nsblk, res->start)) {
rc = -ENOMEM;
goto abort;
}
}
/* release slots associated with any invalidated UUIDs */
mutex_lock(&nd_mapping->lock);
list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list)
if (test_and_clear_bit(ND_LABEL_REAP, &label_ent->flags)) {
reap_victim(nd_mapping, label_ent);
list_move(&label_ent->list, &list);
}
mutex_unlock(&nd_mapping->lock);
/*
* Find the resource associated with the first label in the set
* per the v1.2 namespace specification.
*/
for (i = 0; i < nsblk->num_resources; i++) {
struct resource *min = nsblk->res[min_dpa_idx];
res = nsblk->res[i];
if (res->start < min->start)
min_dpa_idx = i;
}
for (i = 0; i < nsblk->num_resources; i++) {
size_t offset;
res = nsblk->res[i];
if (is_old_resource(res, old_res_list, old_num_resources))
continue; /* carry-over */
slot = nd_label_alloc_slot(ndd);
if (slot == UINT_MAX) {
rc = -ENXIO;
goto abort;
}
dev_dbg(ndd->dev, "allocated: %d\n", slot);
nd_label = to_label(ndd, slot);
memset(nd_label, 0, sizeof_namespace_label(ndd));
nsl_set_uuid(ndd, nd_label, nsblk->uuid);
nsl_set_name(ndd, nd_label, nsblk->alt_name);
nsl_set_flags(ndd, nd_label, NSLABEL_FLAG_LOCAL);
nsl_set_blk_nlabel(ndd, nd_label, nsblk->num_resources,
i == min_dpa_idx);
nsl_set_blk_position(ndd, nd_label, i == min_dpa_idx);
nsl_set_blk_isetcookie(ndd, nd_label, nd_set->cookie2);
nsl_set_dpa(ndd, nd_label, res->start);
nsl_set_rawsize(ndd, nd_label, resource_size(res));
nsl_set_lbasize(ndd, nd_label, nsblk->lbasize);
nsl_set_slot(ndd, nd_label, slot);
nsl_set_type_guid(ndd, nd_label, &nd_set->type_guid);
nsl_set_claim_class(ndd, nd_label, ndns->claim_class);
nsl_calculate_checksum(ndd, nd_label);
/* update label */
offset = nd_label_offset(ndd, nd_label);
rc = nvdimm_set_config_data(ndd, offset, nd_label,
sizeof_namespace_label(ndd));
if (rc < 0)
goto abort;
}
/* free up now unused slots in the new index */
for_each_set_bit(slot, victim_map, victim_map ? nslot : 0) {
dev_dbg(ndd->dev, "free: %d\n", slot);
nd_label_free_slot(ndd, slot);
}
/* update index */
rc = nd_label_write_index(ndd, ndd->ns_next,
nd_inc_seq(__le32_to_cpu(nsindex->seq)), 0);
if (rc)
goto abort;
/*
* Now that the on-dimm labels are up to date, fix up the tracking
* entries in nd_mapping->labels
*/
nlabel = 0;
mutex_lock(&nd_mapping->lock);
list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
nd_label = label_ent->label;
if (!nd_label)
continue;
nlabel++;
if (!nsl_uuid_equal(ndd, nd_label, nsblk->uuid))
continue;
nlabel--;
list_move(&label_ent->list, &list);
label_ent->label = NULL;
}
list_splice_tail_init(&list, &nd_mapping->labels);
mutex_unlock(&nd_mapping->lock);
if (nlabel + nsblk->num_resources > num_labels) {
/*
* Bug, we can't end up with more resources than
* available labels
*/
WARN_ON_ONCE(1);
rc = -ENXIO;
goto out;
}
mutex_lock(&nd_mapping->lock);
label_ent = list_first_entry_or_null(&nd_mapping->labels,
typeof(*label_ent), list);
if (!label_ent) {
WARN_ON(1);
mutex_unlock(&nd_mapping->lock);
rc = -ENXIO;
goto out;
}
for_each_clear_bit_le(slot, free, nslot) {
nd_label = to_label(ndd, slot);
if (!nsl_uuid_equal(ndd, nd_label, nsblk->uuid))
continue;
res = to_resource(ndd, nd_label);
res->flags &= ~DPA_RESOURCE_ADJUSTED;
dev_vdbg(&nsblk->common.dev, "assign label slot: %d\n", slot);
list_for_each_entry_from(label_ent, &nd_mapping->labels, list) {
if (label_ent->label)
continue;
label_ent->label = nd_label;
nd_label = NULL;
break;
}
if (nd_label)
dev_WARN(&nsblk->common.dev,
"failed to track label slot%d\n", slot);
}
mutex_unlock(&nd_mapping->lock);
out:
kfree(old_res_list);
bitmap_free(victim_map);
return rc;
abort:
/*
* 1/ repair the allocated label bitmap in the index
* 2/ restore the resource list
*/
nd_label_copy(ndd, nsindex, to_current_namespace_index(ndd));
kfree(nsblk->res);
nsblk->res = old_res_list;
nsblk->num_resources = old_num_resources;
old_res_list = NULL;
goto out;
}
static int init_labels(struct nd_mapping *nd_mapping, int num_labels)
{
int i, old_num_labels = 0;
......@@ -1425,26 +1101,6 @@ int nd_pmem_namespace_label_update(struct nd_region *nd_region,
return 0;
}
int nd_blk_namespace_label_update(struct nd_region *nd_region,
struct nd_namespace_blk *nsblk, resource_size_t size)
{
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct resource *res;
int count = 0;
if (size == 0)
return del_labels(nd_mapping, nsblk->uuid);
for_each_dpa_resource(to_ndd(nd_mapping), res)
count++;
count = init_labels(nd_mapping, count);
if (count < 0)
return count;
return __blk_label_update(nd_region, nd_mapping, nsblk, count);
}
int __init nd_label_init(void)
{
WARN_ON(guid_parse(NVDIMM_BTT_GUID, &nvdimm_btt_guid));
......
......@@ -193,7 +193,7 @@ struct nd_namespace_label {
/**
* struct nd_label_id - identifier string for dpa allocation
* @id: "{blk|pmem}-<namespace uuid>"
* @id: "pmem-<namespace uuid>"
*/
struct nd_label_id {
char id[ND_LABEL_ID_SIZE];
......@@ -221,9 +221,6 @@ bool nd_label_free_slot(struct nvdimm_drvdata *ndd, u32 slot);
u32 nd_label_nfree(struct nvdimm_drvdata *ndd);
struct nd_region;
struct nd_namespace_pmem;
struct nd_namespace_blk;
int nd_pmem_namespace_label_update(struct nd_region *nd_region,
struct nd_namespace_pmem *nspm, resource_size_t size);
int nd_blk_namespace_label_update(struct nd_region *nd_region,
struct nd_namespace_blk *nsblk, resource_size_t size);
#endif /* __LABEL_H__ */
......@@ -32,21 +32,7 @@ static void namespace_pmem_release(struct device *dev)
kfree(nspm);
}
static void namespace_blk_release(struct device *dev)
{
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
struct nd_region *nd_region = to_nd_region(dev->parent);
if (nsblk->id >= 0)
ida_simple_remove(&nd_region->ns_ida, nsblk->id);
kfree(nsblk->alt_name);
kfree(nsblk->uuid);
kfree(nsblk->res);
kfree(nsblk);
}
static bool is_namespace_pmem(const struct device *dev);
static bool is_namespace_blk(const struct device *dev);
static bool is_namespace_io(const struct device *dev);
static int is_uuid_busy(struct device *dev, void *data)
......@@ -57,10 +43,6 @@ static int is_uuid_busy(struct device *dev, void *data)
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
uuid2 = nspm->uuid;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
uuid2 = nsblk->uuid;
} else if (is_nd_btt(dev)) {
struct nd_btt *nd_btt = to_nd_btt(dev);
......@@ -178,12 +160,6 @@ const char *nvdimm_namespace_disk_name(struct nd_namespace_common *ndns,
else
sprintf(name, "pmem%d%s", nd_region->id,
suffix ? suffix : "");
} else if (is_namespace_blk(&ndns->dev)) {
struct nd_namespace_blk *nsblk;
nsblk = to_nd_namespace_blk(&ndns->dev);
sprintf(name, "ndblk%d.%d%s", nd_region->id, nsblk->id,
suffix ? suffix : "");
} else {
return NULL;
}
......@@ -201,10 +177,6 @@ const uuid_t *nd_dev_to_uuid(struct device *dev)
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
return nspm->uuid;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
return nsblk->uuid;
} else
return &uuid_null;
}
......@@ -229,10 +201,6 @@ static ssize_t __alt_name_store(struct device *dev, const char *buf,
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
ns_altname = &nspm->alt_name;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
ns_altname = &nsblk->alt_name;
} else
return -ENXIO;
......@@ -264,83 +232,6 @@ static ssize_t __alt_name_store(struct device *dev, const char *buf,
return rc;
}
static resource_size_t nd_namespace_blk_size(struct nd_namespace_blk *nsblk)
{
struct nd_region *nd_region = to_nd_region(nsblk->common.dev.parent);
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct nd_label_id label_id;
resource_size_t size = 0;
struct resource *res;
if (!nsblk->uuid)
return 0;
nd_label_gen_id(&label_id, nsblk->uuid, NSLABEL_FLAG_LOCAL);
for_each_dpa_resource(ndd, res)
if (strcmp(res->name, label_id.id) == 0)
size += resource_size(res);
return size;
}
static bool __nd_namespace_blk_validate(struct nd_namespace_blk *nsblk)
{
struct nd_region *nd_region = to_nd_region(nsblk->common.dev.parent);
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct nd_label_id label_id;
struct resource *res;
int count, i;
if (!nsblk->uuid || !nsblk->lbasize || !ndd)
return false;
count = 0;
nd_label_gen_id(&label_id, nsblk->uuid, NSLABEL_FLAG_LOCAL);
for_each_dpa_resource(ndd, res) {
if (strcmp(res->name, label_id.id) != 0)
continue;
/*
* Resources with unacknowledged adjustments indicate a
* failure to update labels
*/
if (res->flags & DPA_RESOURCE_ADJUSTED)
return false;
count++;
}
/* These values match after a successful label update */
if (count != nsblk->num_resources)
return false;
for (i = 0; i < nsblk->num_resources; i++) {
struct resource *found = NULL;
for_each_dpa_resource(ndd, res)
if (res == nsblk->res[i]) {
found = res;
break;
}
/* stale resource */
if (!found)
return false;
}
return true;
}
resource_size_t nd_namespace_blk_validate(struct nd_namespace_blk *nsblk)
{
resource_size_t size;
nvdimm_bus_lock(&nsblk->common.dev);
size = __nd_namespace_blk_validate(nsblk);
nvdimm_bus_unlock(&nsblk->common.dev);
return size;
}
EXPORT_SYMBOL(nd_namespace_blk_validate);
static int nd_namespace_label_update(struct nd_region *nd_region,
struct device *dev)
{
......@@ -363,16 +254,6 @@ static int nd_namespace_label_update(struct nd_region *nd_region,
return 0;
return nd_pmem_namespace_label_update(nd_region, nspm, size);
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
resource_size_t size = nd_namespace_blk_size(nsblk);
if (size == 0 && nsblk->uuid)
/* delete allocation */;
else if (!nsblk->uuid || !nsblk->lbasize)
return 0;
return nd_blk_namespace_label_update(nd_region, nsblk, size);
} else
return -ENXIO;
}
......@@ -405,10 +286,6 @@ static ssize_t alt_name_show(struct device *dev,
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
ns_altname = nspm->alt_name;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
ns_altname = nsblk->alt_name;
} else
return -ENXIO;
......@@ -420,13 +297,11 @@ static int scan_free(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, struct nd_label_id *label_id,
resource_size_t n)
{
bool is_blk = strncmp(label_id->id, "blk", 3) == 0;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
int rc = 0;
while (n) {
struct resource *res, *last;
resource_size_t new_start;
last = NULL;
for_each_dpa_resource(ndd, res)
......@@ -444,16 +319,7 @@ static int scan_free(struct nd_region *nd_region,
continue;
}
/*
* Keep BLK allocations relegated to high DPA as much as
* possible
*/
if (is_blk)
new_start = res->start + n;
else
new_start = res->start;
rc = adjust_resource(res, new_start, resource_size(res) - n);
rc = adjust_resource(res, res->start, resource_size(res) - n);
if (rc == 0)
res->flags |= DPA_RESOURCE_ADJUSTED;
nd_dbg_dpa(nd_region, ndd, res, "shrink %d\n", rc);
......@@ -495,20 +361,12 @@ static resource_size_t init_dpa_allocation(struct nd_label_id *label_id,
struct nd_region *nd_region, struct nd_mapping *nd_mapping,
resource_size_t n)
{
bool is_blk = strncmp(label_id->id, "blk", 3) == 0;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
resource_size_t first_dpa;
struct resource *res;
int rc = 0;
/* allocate blk from highest dpa first */
if (is_blk)
first_dpa = nd_mapping->start + nd_mapping->size - n;
else
first_dpa = nd_mapping->start;
/* first resource allocation for this label-id or dimm */
res = nvdimm_allocate_dpa(ndd, label_id, first_dpa, n);
res = nvdimm_allocate_dpa(ndd, label_id, nd_mapping->start, n);
if (!res)
rc = -EBUSY;
......@@ -539,7 +397,6 @@ static void space_valid(struct nd_region *nd_region, struct nvdimm_drvdata *ndd,
resource_size_t n, struct resource *valid)
{
bool is_reserve = strcmp(label_id->id, "pmem-reserve") == 0;
bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
unsigned long align;
align = nd_region->align / nd_region->ndr_mappings;
......@@ -552,21 +409,6 @@ static void space_valid(struct nd_region *nd_region, struct nvdimm_drvdata *ndd,
if (is_reserve)
return;
if (!is_pmem) {
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct nvdimm_bus *nvdimm_bus;
struct blk_alloc_info info = {
.nd_mapping = nd_mapping,
.available = nd_mapping->size,
.res = valid,
};
WARN_ON(!is_nd_blk(&nd_region->dev));
nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev);
device_for_each_child(&nvdimm_bus->dev, &info, alias_dpa_busy);
return;
}
/* allocation needs to be contiguous, so this is all or nothing */
if (resource_size(valid) < n)
goto invalid;
......@@ -594,7 +436,6 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
resource_size_t n)
{
resource_size_t mapping_end = nd_mapping->start + nd_mapping->size - 1;
bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct resource *res, *exist = NULL, valid;
const resource_size_t to_allocate = n;
......@@ -692,10 +533,6 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
}
if (strcmp(action, "allocate") == 0) {
/* BLK allocate bottom up */
if (!is_pmem)
valid.start += available - allocate;
new_res = nvdimm_allocate_dpa(ndd, label_id,
valid.start, allocate);
if (!new_res)
......@@ -731,12 +568,7 @@ static resource_size_t scan_allocate(struct nd_region *nd_region,
return 0;
}
/*
* If we allocated nothing in the BLK case it may be because we are in
* an initial "pmem-reserve pass". Only do an initial BLK allocation
* when none of the DPA space is reserved.
*/
if ((is_pmem || !ndd->dpa.child) && n == to_allocate)
if (n == to_allocate)
return init_dpa_allocation(label_id, nd_region, nd_mapping, n);
return n;
}
......@@ -795,7 +627,7 @@ int __reserve_free_pmem(struct device *dev, void *data)
if (nd_mapping->nvdimm != nvdimm)
continue;
n = nd_pmem_available_dpa(nd_region, nd_mapping, &rem);
n = nd_pmem_available_dpa(nd_region, nd_mapping);
if (n == 0)
return 0;
rem = scan_allocate(nd_region, nd_mapping, &label_id, n);
......@@ -820,19 +652,6 @@ void release_free_pmem(struct nvdimm_bus *nvdimm_bus,
nvdimm_free_dpa(ndd, res);
}
static int reserve_free_pmem(struct nvdimm_bus *nvdimm_bus,
struct nd_mapping *nd_mapping)
{
struct nvdimm *nvdimm = nd_mapping->nvdimm;
int rc;
rc = device_for_each_child(&nvdimm_bus->dev, nvdimm,
__reserve_free_pmem);
if (rc)
release_free_pmem(nvdimm_bus, nd_mapping);
return rc;
}
/**
* grow_dpa_allocation - for each dimm allocate n bytes for @label_id
* @nd_region: the set of dimms to allocate @n more bytes from
......@@ -849,37 +668,14 @@ static int reserve_free_pmem(struct nvdimm_bus *nvdimm_bus,
static int grow_dpa_allocation(struct nd_region *nd_region,
struct nd_label_id *label_id, resource_size_t n)
{
struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev);
bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0;
int i;
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
resource_size_t rem = n;
int rc, j;
/*
* In the BLK case try once with all unallocated PMEM
* reserved, and once without
*/
for (j = is_pmem; j < 2; j++) {
bool blk_only = j == 0;
if (blk_only) {
rc = reserve_free_pmem(nvdimm_bus, nd_mapping);
if (rc)
return rc;
}
rem = scan_allocate(nd_region, nd_mapping,
label_id, rem);
if (blk_only)
release_free_pmem(nvdimm_bus, nd_mapping);
/* try again and allow encroachments into PMEM */
if (rem == 0)
break;
}
int rc;
rem = scan_allocate(nd_region, nd_mapping, label_id, rem);
dev_WARN_ONCE(&nd_region->dev, rem,
"allocation underrun: %#llx of %#llx bytes\n",
(unsigned long long) n - rem,
......@@ -966,12 +762,6 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
uuid = nspm->uuid;
id = nspm->id;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
uuid = nsblk->uuid;
flags = NSLABEL_FLAG_LOCAL;
id = nsblk->id;
}
/*
......@@ -998,8 +788,8 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
ndd = to_ndd(nd_mapping);
/*
* All dimms in an interleave set, or the base dimm for a blk
* region, need to be enabled for the size to be changed.
* All dimms in an interleave set, need to be enabled
* for the size to be changed.
*/
if (!ndd)
return -ENXIO;
......@@ -1067,10 +857,6 @@ static ssize_t size_store(struct device *dev,
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
uuid = &nspm->uuid;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
uuid = &nsblk->uuid;
}
if (rc == 0 && val == 0 && uuid) {
......@@ -1095,8 +881,6 @@ resource_size_t __nvdimm_namespace_capacity(struct nd_namespace_common *ndns)
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
return resource_size(&nspm->nsio.res);
} else if (is_namespace_blk(dev)) {
return nd_namespace_blk_size(to_nd_namespace_blk(dev));
} else if (is_namespace_io(dev)) {
struct nd_namespace_io *nsio = to_nd_namespace_io(dev);
......@@ -1152,12 +936,8 @@ static uuid_t *namespace_to_uuid(struct device *dev)
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
return nspm->uuid;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
return nsblk->uuid;
} else
return ERR_PTR(-ENXIO);
}
return ERR_PTR(-ENXIO);
}
static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
......@@ -1183,7 +963,6 @@ static int namespace_update_uuid(struct nd_region *nd_region,
struct device *dev, uuid_t *new_uuid,
uuid_t **old_uuid)
{
u32 flags = is_namespace_blk(dev) ? NSLABEL_FLAG_LOCAL : 0;
struct nd_label_id old_label_id;
struct nd_label_id new_label_id;
int i;
......@@ -1214,8 +993,8 @@ static int namespace_update_uuid(struct nd_region *nd_region,
return -EBUSY;
}
nd_label_gen_id(&old_label_id, *old_uuid, flags);
nd_label_gen_id(&new_label_id, new_uuid, flags);
nd_label_gen_id(&old_label_id, *old_uuid, 0);
nd_label_gen_id(&new_label_id, new_uuid, 0);
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
......@@ -1261,10 +1040,6 @@ static ssize_t uuid_store(struct device *dev,
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
ns_uuid = &nspm->uuid;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
ns_uuid = &nsblk->uuid;
} else
return -ENXIO;
......@@ -1313,21 +1088,11 @@ static ssize_t resource_show(struct device *dev,
}
static DEVICE_ATTR_ADMIN_RO(resource);
static const unsigned long blk_lbasize_supported[] = { 512, 520, 528,
4096, 4104, 4160, 4224, 0 };
static const unsigned long pmem_lbasize_supported[] = { 512, 4096, 0 };
static ssize_t sector_size_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
return nd_size_select_show(nsblk->lbasize,
blk_lbasize_supported, buf);
}
if (is_namespace_pmem(dev)) {
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
......@@ -1345,12 +1110,7 @@ static ssize_t sector_size_store(struct device *dev,
unsigned long *lbasize;
ssize_t rc = 0;
if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
lbasize = &nsblk->lbasize;
supported = blk_lbasize_supported;
} else if (is_namespace_pmem(dev)) {
if (is_namespace_pmem(dev)) {
struct nd_namespace_pmem *nspm = to_nd_namespace_pmem(dev);
lbasize = &nspm->lbasize;
......@@ -1390,11 +1150,6 @@ static ssize_t dpa_extents_show(struct device *dev,
uuid = nspm->uuid;
flags = 0;
} else if (is_namespace_blk(dev)) {
struct nd_namespace_blk *nsblk = to_nd_namespace_blk(dev);
uuid = nsblk->uuid;
flags = NSLABEL_FLAG_LOCAL;
}
if (!uuid)
......@@ -1627,10 +1382,7 @@ static umode_t namespace_visible(struct kobject *kobj,
{
struct device *dev = container_of(kobj, struct device, kobj);
if (a == &dev_attr_resource.attr && is_namespace_blk(dev))
return 0;
if (is_namespace_pmem(dev) || is_namespace_blk(dev)) {
if (is_namespace_pmem(dev)) {
if (a == &dev_attr_size.attr)
return 0644;
......@@ -1671,22 +1423,11 @@ static const struct device_type namespace_pmem_device_type = {
.groups = nd_namespace_attribute_groups,
};
static const struct device_type namespace_blk_device_type = {
.name = "nd_namespace_blk",
.release = namespace_blk_release,
.groups = nd_namespace_attribute_groups,
};
static bool is_namespace_pmem(const struct device *dev)
{
return dev ? dev->type == &namespace_pmem_device_type : false;
}
static bool is_namespace_blk(const struct device *dev)
{
return dev ? dev->type == &namespace_blk_device_type : false;
}
static bool is_namespace_io(const struct device *dev)
{
return dev ? dev->type == &namespace_io_device_type : false;
......@@ -1769,18 +1510,6 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
nspm = to_nd_namespace_pmem(&ndns->dev);
if (uuid_not_set(nspm->uuid, &ndns->dev, __func__))
return ERR_PTR(-ENODEV);
} else if (is_namespace_blk(&ndns->dev)) {
struct nd_namespace_blk *nsblk;
nsblk = to_nd_namespace_blk(&ndns->dev);
if (uuid_not_set(nsblk->uuid, &ndns->dev, __func__))
return ERR_PTR(-ENODEV);
if (!nsblk->lbasize) {
dev_dbg(&ndns->dev, "sector size not set\n");
return ERR_PTR(-ENODEV);
}
if (!nd_namespace_blk_validate(nsblk))
return ERR_PTR(-ENODEV);
}
return ndns;
......@@ -1790,16 +1519,12 @@ EXPORT_SYMBOL(nvdimm_namespace_common_probe);
int devm_namespace_enable(struct device *dev, struct nd_namespace_common *ndns,
resource_size_t size)
{
if (is_namespace_blk(&ndns->dev))
return 0;
return devm_nsio_enable(dev, to_nd_namespace_io(&ndns->dev), size);
}
EXPORT_SYMBOL_GPL(devm_namespace_enable);
void devm_namespace_disable(struct device *dev, struct nd_namespace_common *ndns)
{
if (is_namespace_blk(&ndns->dev))
return;
devm_nsio_disable(dev, to_nd_namespace_io(&ndns->dev));
}
EXPORT_SYMBOL_GPL(devm_namespace_disable);
......@@ -2014,10 +1739,7 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region,
/*
* Fix up each mapping's 'labels' to have the validated pmem label for
* that position at labels[0], and NULL at labels[1]. In the process,
* check that the namespace aligns with interleave-set. We know
* that it does not overlap with any blk namespaces by virtue of
* the dimm being enabled (i.e. nd_label_reserve_dpa()
* succeeded).
* check that the namespace aligns with interleave-set.
*/
nsl_get_uuid(ndd, nd_label, &uuid);
rc = select_pmem_id(nd_region, &uuid);
......@@ -2077,54 +1799,6 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region,
return ERR_PTR(rc);
}
struct resource *nsblk_add_resource(struct nd_region *nd_region,
struct nvdimm_drvdata *ndd, struct nd_namespace_blk *nsblk,
resource_size_t start)
{
struct nd_label_id label_id;
struct resource *res;
nd_label_gen_id(&label_id, nsblk->uuid, NSLABEL_FLAG_LOCAL);
res = krealloc(nsblk->res,
sizeof(void *) * (nsblk->num_resources + 1),
GFP_KERNEL);
if (!res)
return NULL;
nsblk->res = (struct resource **) res;
for_each_dpa_resource(ndd, res)
if (strcmp(res->name, label_id.id) == 0
&& res->start == start) {
nsblk->res[nsblk->num_resources++] = res;
return res;
}
return NULL;
}
static struct device *nd_namespace_blk_create(struct nd_region *nd_region)
{
struct nd_namespace_blk *nsblk;
struct device *dev;
if (!is_nd_blk(&nd_region->dev))
return NULL;
nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
if (!nsblk)
return NULL;
dev = &nsblk->common.dev;
dev->type = &namespace_blk_device_type;
nsblk->id = ida_simple_get(&nd_region->ns_ida, 0, 0, GFP_KERNEL);
if (nsblk->id < 0) {
kfree(nsblk);
return NULL;
}
dev_set_name(dev, "namespace%d.%d", nd_region->id, nsblk->id);
dev->parent = &nd_region->dev;
return &nsblk->common.dev;
}
static struct device *nd_namespace_pmem_create(struct nd_region *nd_region)
{
struct nd_namespace_pmem *nspm;
......@@ -2163,18 +1837,14 @@ void nd_region_create_ns_seed(struct nd_region *nd_region)
if (nd_region_to_nstype(nd_region) == ND_DEVICE_NAMESPACE_IO)
return;
if (is_nd_blk(&nd_region->dev))
nd_region->ns_seed = nd_namespace_blk_create(nd_region);
else
nd_region->ns_seed = nd_namespace_pmem_create(nd_region);
nd_region->ns_seed = nd_namespace_pmem_create(nd_region);
/*
* Seed creation failures are not fatal, provisioning is simply
* disabled until memory becomes available
*/
if (!nd_region->ns_seed)
dev_err(&nd_region->dev, "failed to create %s namespace\n",
is_nd_blk(&nd_region->dev) ? "blk" : "pmem");
dev_err(&nd_region->dev, "failed to create namespace\n");
else
nd_device_register(nd_region->ns_seed);
}
......@@ -2225,7 +1895,6 @@ static int add_namespace_resource(struct nd_region *nd_region,
for (i = 0; i < count; i++) {
uuid_t *uuid = namespace_to_uuid(devs[i]);
struct resource *res;
if (IS_ERR(uuid)) {
WARN_ON(1);
......@@ -2234,91 +1903,23 @@ static int add_namespace_resource(struct nd_region *nd_region,
if (!nsl_uuid_equal(ndd, nd_label, uuid))
continue;
if (is_namespace_blk(devs[i])) {
res = nsblk_add_resource(nd_region, ndd,
to_nd_namespace_blk(devs[i]),
nsl_get_dpa(ndd, nd_label));
if (!res)
return -ENXIO;
nd_dbg_dpa(nd_region, ndd, res, "%d assign\n", count);
} else {
dev_err(&nd_region->dev,
"error: conflicting extents for uuid: %pUb\n",
uuid);
return -ENXIO;
}
break;
dev_err(&nd_region->dev,
"error: conflicting extents for uuid: %pUb\n", uuid);
return -ENXIO;
}
return i;
}
static struct device *create_namespace_blk(struct nd_region *nd_region,
struct nd_namespace_label *nd_label, int count)
{
struct nd_mapping *nd_mapping = &nd_region->mapping[0];
struct nd_interleave_set *nd_set = nd_region->nd_set;
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
struct nd_namespace_blk *nsblk;
char name[NSLABEL_NAME_LEN];
struct device *dev = NULL;
struct resource *res;
uuid_t uuid;
if (!nsl_validate_type_guid(ndd, nd_label, &nd_set->type_guid))
return ERR_PTR(-EAGAIN);
if (!nsl_validate_blk_isetcookie(ndd, nd_label, nd_set->cookie2))
return ERR_PTR(-EAGAIN);
nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
if (!nsblk)
return ERR_PTR(-ENOMEM);
dev = &nsblk->common.dev;
dev->type = &namespace_blk_device_type;
dev->parent = &nd_region->dev;
nsblk->id = -1;
nsblk->lbasize = nsl_get_lbasize(ndd, nd_label);
nsl_get_uuid(ndd, nd_label, &uuid);
nsblk->uuid = kmemdup(&uuid, sizeof(uuid_t), GFP_KERNEL);
nsblk->common.claim_class = nsl_get_claim_class(ndd, nd_label);
if (!nsblk->uuid)
goto blk_err;
nsl_get_name(ndd, nd_label, name);
if (name[0]) {
nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN, GFP_KERNEL);
if (!nsblk->alt_name)
goto blk_err;
}
res = nsblk_add_resource(nd_region, ndd, nsblk,
nsl_get_dpa(ndd, nd_label));
if (!res)
goto blk_err;
nd_dbg_dpa(nd_region, ndd, res, "%d: assign\n", count);
return dev;
blk_err:
namespace_blk_release(dev);
return ERR_PTR(-ENXIO);
}
static int cmp_dpa(const void *a, const void *b)
{
const struct device *dev_a = *(const struct device **) a;
const struct device *dev_b = *(const struct device **) b;
struct nd_namespace_blk *nsblk_a, *nsblk_b;
struct nd_namespace_pmem *nspm_a, *nspm_b;
if (is_namespace_io(dev_a))
return 0;
if (is_namespace_blk(dev_a)) {
nsblk_a = to_nd_namespace_blk(dev_a);
nsblk_b = to_nd_namespace_blk(dev_b);
return memcmp(&nsblk_a->res[0]->start, &nsblk_b->res[0]->start,
sizeof(resource_size_t));
}
nspm_a = to_nd_namespace_pmem(dev_a);
nspm_b = to_nd_namespace_pmem(dev_b);
......@@ -2339,16 +1940,9 @@ static struct device **scan_labels(struct nd_region *nd_region)
list_for_each_entry_safe(label_ent, e, &nd_mapping->labels, list) {
struct nd_namespace_label *nd_label = label_ent->label;
struct device **__devs;
u32 flags;
if (!nd_label)
continue;
flags = nsl_get_flags(ndd, nd_label);
if (is_nd_blk(&nd_region->dev)
== !!(flags & NSLABEL_FLAG_LOCAL))
/* pass, region matches label type */;
else
continue;
/* skip labels that describe extents outside of the region */
if (nsl_get_dpa(ndd, nd_label) < nd_mapping->start ||
......@@ -2367,12 +1961,7 @@ static struct device **scan_labels(struct nd_region *nd_region)
kfree(devs);
devs = __devs;
if (is_nd_blk(&nd_region->dev))
dev = create_namespace_blk(nd_region, nd_label, count);
else
dev = create_namespace_pmem(nd_region, nd_mapping,
nd_label);
dev = create_namespace_pmem(nd_region, nd_mapping, nd_label);
if (IS_ERR(dev)) {
switch (PTR_ERR(dev)) {
case -EAGAIN:
......@@ -2389,35 +1978,25 @@ static struct device **scan_labels(struct nd_region *nd_region)
}
dev_dbg(&nd_region->dev, "discovered %d %s namespace%s\n",
count, is_nd_blk(&nd_region->dev)
? "blk" : "pmem", count == 1 ? "" : "s");
dev_dbg(&nd_region->dev, "discovered %d namespace%s\n", count,
count == 1 ? "" : "s");
if (count == 0) {
struct nd_namespace_pmem *nspm;
/* Publish a zero-sized namespace for userspace to configure. */
nd_mapping_free_labels(nd_mapping);
devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
if (!devs)
goto err;
if (is_nd_blk(&nd_region->dev)) {
struct nd_namespace_blk *nsblk;
nsblk = kzalloc(sizeof(*nsblk), GFP_KERNEL);
if (!nsblk)
goto err;
dev = &nsblk->common.dev;
dev->type = &namespace_blk_device_type;
} else {
struct nd_namespace_pmem *nspm;
nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
if (!nspm)
goto err;
dev = &nspm->nsio.common.dev;
dev->type = &namespace_pmem_device_type;
nd_namespace_pmem_set_resource(nd_region, nspm, 0);
}
nspm = kzalloc(sizeof(*nspm), GFP_KERNEL);
if (!nspm)
goto err;
dev = &nspm->nsio.common.dev;
dev->type = &namespace_pmem_device_type;
nd_namespace_pmem_set_resource(nd_region, nspm, 0);
dev->parent = &nd_region->dev;
devs[count++] = dev;
} else if (is_memory(&nd_region->dev)) {
......@@ -2452,10 +2031,7 @@ static struct device **scan_labels(struct nd_region *nd_region)
err:
if (devs) {
for (i = 0; devs[i]; i++)
if (is_nd_blk(&nd_region->dev))
namespace_blk_release(devs[i]);
else
namespace_pmem_release(devs[i]);
namespace_pmem_release(devs[i]);
kfree(devs);
}
return NULL;
......@@ -2554,12 +2130,6 @@ static int init_active_labels(struct nd_region *nd_region)
if (!label_ent)
break;
label = nd_label_active(ndd, j);
if (test_bit(NDD_NOBLK, &nvdimm->flags)) {
u32 flags = nsl_get_flags(ndd, label);
flags &= ~NSLABEL_FLAG_LOCAL;
nsl_set_flags(ndd, label, flags);
}
label_ent->label = label;
mutex_lock(&nd_mapping->lock);
......@@ -2603,7 +2173,6 @@ int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
devs = create_namespace_io(nd_region);
break;
case ND_DEVICE_NAMESPACE_PMEM:
case ND_DEVICE_NAMESPACE_BLK:
devs = create_namespaces(nd_region);
break;
default:
......@@ -2618,19 +2187,12 @@ int nd_region_register_namespaces(struct nd_region *nd_region, int *err)
struct device *dev = devs[i];
int id;
if (type == ND_DEVICE_NAMESPACE_BLK) {
struct nd_namespace_blk *nsblk;
nsblk = to_nd_namespace_blk(dev);
id = ida_simple_get(&nd_region->ns_ida, 0, 0,
GFP_KERNEL);
nsblk->id = id;
} else if (type == ND_DEVICE_NAMESPACE_PMEM) {
if (type == ND_DEVICE_NAMESPACE_PMEM) {
struct nd_namespace_pmem *nspm;
nspm = to_nd_namespace_pmem(dev);
id = ida_simple_get(&nd_region->ns_ida, 0, 0,
GFP_KERNEL);
GFP_KERNEL);
nspm->id = id;
} else
id = i;
......
......@@ -82,30 +82,12 @@ static inline void nvdimm_security_overwrite_query(struct work_struct *work)
}
#endif
/**
* struct blk_alloc_info - tracking info for BLK dpa scanning
* @nd_mapping: blk region mapping boundaries
* @available: decremented in alias_dpa_busy as aliased PMEM is scanned
* @busy: decremented in blk_dpa_busy to account for ranges already
* handled by alias_dpa_busy
* @res: alias_dpa_busy interprets this a free space range that needs to
* be truncated to the valid BLK allocation starting DPA, blk_dpa_busy
* treats it as a busy range that needs the aliased PMEM ranges
* truncated.
*/
struct blk_alloc_info {
struct nd_mapping *nd_mapping;
resource_size_t available, busy;
struct resource *res;
};
bool is_nvdimm(struct device *dev);
bool is_nd_pmem(struct device *dev);
bool is_nd_volatile(struct device *dev);
bool is_nd_blk(struct device *dev);
static inline bool is_nd_region(struct device *dev)
{
return is_nd_pmem(dev) || is_nd_blk(dev) || is_nd_volatile(dev);
return is_nd_pmem(dev) || is_nd_volatile(dev);
}
static inline bool is_memory(struct device *dev)
{
......@@ -142,17 +124,12 @@ resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region,
struct nd_mapping *nd_mapping);
resource_size_t nd_region_allocatable_dpa(struct nd_region *nd_region);
resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region,
struct nd_mapping *nd_mapping, resource_size_t *overlap);
resource_size_t nd_blk_available_dpa(struct nd_region *nd_region);
struct nd_mapping *nd_mapping);
resource_size_t nd_region_available_dpa(struct nd_region *nd_region);
int nd_region_conflict(struct nd_region *nd_region, resource_size_t start,
resource_size_t size);
resource_size_t nvdimm_allocated_dpa(struct nvdimm_drvdata *ndd,
struct nd_label_id *label_id);
int alias_dpa_busy(struct device *dev, void *data);
struct resource *nsblk_add_resource(struct nd_region *nd_region,
struct nvdimm_drvdata *ndd, struct nd_namespace_blk *nsblk,
resource_size_t start);
int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd);
void get_ndd(struct nvdimm_drvdata *ndd);
resource_size_t __nvdimm_namespace_capacity(struct nd_namespace_common *ndns);
......
......@@ -295,9 +295,6 @@ static inline const u8 *nsl_uuid_raw(struct nvdimm_drvdata *ndd,
return nd_label->efi.uuid;
}
bool nsl_validate_blk_isetcookie(struct nvdimm_drvdata *ndd,
struct nd_namespace_label *nd_label,
u64 isetcookie);
bool nsl_validate_type_guid(struct nvdimm_drvdata *ndd,
struct nd_namespace_label *nd_label, guid_t *guid);
enum nvdimm_claim_class nsl_get_claim_class(struct nvdimm_drvdata *ndd,
......@@ -437,14 +434,6 @@ static inline bool nsl_validate_nlabel(struct nd_region *nd_region,
return nsl_get_nlabel(ndd, nd_label) == nd_region->ndr_mappings;
}
struct nd_blk_region {
int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
int (*do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw);
void *blk_provider_data;
struct nd_region nd_region;
};
/*
* Lookup next in the repeating sequence of 01, 10, and 11.
*/
......@@ -672,7 +661,6 @@ static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
return -ENXIO;
}
#endif
int nd_blk_region_init(struct nd_region *nd_region);
int nd_region_activate(struct nd_region *nd_region);
static inline bool is_bad_pmem(struct badblocks *bb, sector_t sector,
unsigned int len)
......@@ -687,7 +675,6 @@ static inline bool is_bad_pmem(struct badblocks *bb, sector_t sector,
return false;
}
resource_size_t nd_namespace_blk_validate(struct nd_namespace_blk *nsblk);
const uuid_t *nd_dev_to_uuid(struct device *dev);
bool pmem_should_map_pages(struct device *dev);
#endif /* __ND_H__ */
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* nd_perf.c: NVDIMM Device Performance Monitoring Unit support
*
* Perf interface to expose nvdimm performance stats.
*
* Copyright (C) 2021 IBM Corporation
*/
#define pr_fmt(fmt) "nvdimm_pmu: " fmt
#include <linux/nd.h>
#include <linux/platform_device.h>
#define EVENT(_name, _code) enum{_name = _code}
/*
* NVDIMM Events codes.
*/
/* Controller Reset Count */
EVENT(CTL_RES_CNT, 0x1);
/* Controller Reset Elapsed Time */
EVENT(CTL_RES_TM, 0x2);
/* Power-on Seconds */
EVENT(POWERON_SECS, 0x3);
/* Life Remaining */
EVENT(MEM_LIFE, 0x4);
/* Critical Resource Utilization */
EVENT(CRI_RES_UTIL, 0x5);
/* Host Load Count */
EVENT(HOST_L_CNT, 0x6);
/* Host Store Count */
EVENT(HOST_S_CNT, 0x7);
/* Host Store Duration */
EVENT(HOST_S_DUR, 0x8);
/* Host Load Duration */
EVENT(HOST_L_DUR, 0x9);
/* Media Read Count */
EVENT(MED_R_CNT, 0xa);
/* Media Write Count */
EVENT(MED_W_CNT, 0xb);
/* Media Read Duration */
EVENT(MED_R_DUR, 0xc);
/* Media Write Duration */
EVENT(MED_W_DUR, 0xd);
/* Cache Read Hit Count */
EVENT(CACHE_RH_CNT, 0xe);
/* Cache Write Hit Count */
EVENT(CACHE_WH_CNT, 0xf);
/* Fast Write Count */
EVENT(FAST_W_CNT, 0x10);
NVDIMM_EVENT_ATTR(ctl_res_cnt, CTL_RES_CNT);
NVDIMM_EVENT_ATTR(ctl_res_tm, CTL_RES_TM);
NVDIMM_EVENT_ATTR(poweron_secs, POWERON_SECS);
NVDIMM_EVENT_ATTR(mem_life, MEM_LIFE);
NVDIMM_EVENT_ATTR(cri_res_util, CRI_RES_UTIL);
NVDIMM_EVENT_ATTR(host_l_cnt, HOST_L_CNT);
NVDIMM_EVENT_ATTR(host_s_cnt, HOST_S_CNT);
NVDIMM_EVENT_ATTR(host_s_dur, HOST_S_DUR);
NVDIMM_EVENT_ATTR(host_l_dur, HOST_L_DUR);
NVDIMM_EVENT_ATTR(med_r_cnt, MED_R_CNT);
NVDIMM_EVENT_ATTR(med_w_cnt, MED_W_CNT);
NVDIMM_EVENT_ATTR(med_r_dur, MED_R_DUR);
NVDIMM_EVENT_ATTR(med_w_dur, MED_W_DUR);
NVDIMM_EVENT_ATTR(cache_rh_cnt, CACHE_RH_CNT);
NVDIMM_EVENT_ATTR(cache_wh_cnt, CACHE_WH_CNT);
NVDIMM_EVENT_ATTR(fast_w_cnt, FAST_W_CNT);
static struct attribute *nvdimm_events_attr[] = {
NVDIMM_EVENT_PTR(CTL_RES_CNT),
NVDIMM_EVENT_PTR(CTL_RES_TM),
NVDIMM_EVENT_PTR(POWERON_SECS),
NVDIMM_EVENT_PTR(MEM_LIFE),
NVDIMM_EVENT_PTR(CRI_RES_UTIL),
NVDIMM_EVENT_PTR(HOST_L_CNT),
NVDIMM_EVENT_PTR(HOST_S_CNT),
NVDIMM_EVENT_PTR(HOST_S_DUR),
NVDIMM_EVENT_PTR(HOST_L_DUR),
NVDIMM_EVENT_PTR(MED_R_CNT),
NVDIMM_EVENT_PTR(MED_W_CNT),
NVDIMM_EVENT_PTR(MED_R_DUR),
NVDIMM_EVENT_PTR(MED_W_DUR),
NVDIMM_EVENT_PTR(CACHE_RH_CNT),
NVDIMM_EVENT_PTR(CACHE_WH_CNT),
NVDIMM_EVENT_PTR(FAST_W_CNT),
NULL
};
static struct attribute_group nvdimm_pmu_events_group = {
.name = "events",
.attrs = nvdimm_events_attr,
};
PMU_FORMAT_ATTR(event, "config:0-4");
static struct attribute *nvdimm_pmu_format_attr[] = {
&format_attr_event.attr,
NULL,
};
static struct attribute_group nvdimm_pmu_format_group = {
.name = "format",
.attrs = nvdimm_pmu_format_attr,
};
ssize_t nvdimm_events_sysfs_show(struct device *dev,
struct device_attribute *attr, char *page)
{
struct perf_pmu_events_attr *pmu_attr;
pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
}
static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pmu *pmu = dev_get_drvdata(dev);
struct nvdimm_pmu *nd_pmu;
nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
}
static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
{
struct nvdimm_pmu *nd_pmu;
u32 target;
int nodeid;
const struct cpumask *cpumask;
nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
/* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
/*
* If given cpu is not same as current designated cpu for
* counter access, just return.
*/
if (cpu != nd_pmu->cpu)
return 0;
/* Check for any active cpu in nd_pmu->arch_cpumask */
target = cpumask_any(&nd_pmu->arch_cpumask);
/*
* Incase we don't have any active cpu in nd_pmu->arch_cpumask,
* check in given cpu's numa node list.
*/
if (target >= nr_cpu_ids) {
nodeid = cpu_to_node(cpu);
cpumask = cpumask_of_node(nodeid);
target = cpumask_any_but(cpumask, cpu);
}
nd_pmu->cpu = target;
/* Migrate nvdimm pmu events to the new target cpu if valid */
if (target >= 0 && target < nr_cpu_ids)
perf_pmu_migrate_context(&nd_pmu->pmu, cpu, target);
return 0;
}
static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
{
struct nvdimm_pmu *nd_pmu;
nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
if (nd_pmu->cpu >= nr_cpu_ids)
nd_pmu->cpu = cpu;
return 0;
}
static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
{
struct perf_pmu_events_attr *pmu_events_attr;
struct attribute **attrs_group;
struct attribute_group *nvdimm_pmu_cpumask_group;
pmu_events_attr = kzalloc(sizeof(*pmu_events_attr), GFP_KERNEL);
if (!pmu_events_attr)
return -ENOMEM;
attrs_group = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
if (!attrs_group) {
kfree(pmu_events_attr);
return -ENOMEM;
}
/* Allocate memory for cpumask attribute group */
nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), GFP_KERNEL);
if (!nvdimm_pmu_cpumask_group) {
kfree(pmu_events_attr);
kfree(attrs_group);
return -ENOMEM;
}
sysfs_attr_init(&pmu_events_attr->attr.attr);
pmu_events_attr->attr.attr.name = "cpumask";
pmu_events_attr->attr.attr.mode = 0444;
pmu_events_attr->attr.show = nvdimm_pmu_cpumask_show;
attrs_group[0] = &pmu_events_attr->attr.attr;
attrs_group[1] = NULL;
nvdimm_pmu_cpumask_group->attrs = attrs_group;
nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
return 0;
}
static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
{
int nodeid, rc;
const struct cpumask *cpumask;
/*
* Incase of cpu hotplug feature, arch specific code
* can provide required cpumask which can be used
* to get designatd cpu for counter access.
* Check for any active cpu in nd_pmu->arch_cpumask.
*/
if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
} else {
/* pick active cpu from the cpumask of device numa node. */
nodeid = dev_to_node(nd_pmu->dev);
cpumask = cpumask_of_node(nodeid);
nd_pmu->cpu = cpumask_any(cpumask);
}
rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
if (rc < 0)
return rc;
nd_pmu->cpuhp_state = rc;
/* Register the pmu instance for cpu hotplug */
rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
if (rc) {
cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
return rc;
}
/* Create cpumask attribute group */
rc = create_cpumask_attr_group(nd_pmu);
if (rc) {
cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
return rc;
}
return 0;
}
static void nvdimm_pmu_free_hotplug_memory(struct nvdimm_pmu *nd_pmu)
{
cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
if (nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR])
kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]->attrs);
kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]);
}
int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
{
int rc;
if (!nd_pmu || !pdev)
return -EINVAL;
/* event functions like add/del/read/event_init and pmu name should not be NULL */
if (WARN_ON_ONCE(!(nd_pmu->pmu.event_init && nd_pmu->pmu.add &&
nd_pmu->pmu.del && nd_pmu->pmu.read && nd_pmu->pmu.name)))
return -EINVAL;
nd_pmu->pmu.attr_groups = kzalloc((NVDIMM_PMU_NULL_ATTR + 1) *
sizeof(struct attribute_group *), GFP_KERNEL);
if (!nd_pmu->pmu.attr_groups)
return -ENOMEM;
/*
* Add platform_device->dev pointer to nvdimm_pmu to access
* device data in events functions.
*/
nd_pmu->dev = &pdev->dev;
/* Fill attribute groups for the nvdimm pmu device */
nd_pmu->pmu.attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
nd_pmu->pmu.attr_groups[NVDIMM_PMU_EVENT_ATTR] = &nvdimm_pmu_events_group;
nd_pmu->pmu.attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
/* Fill attribute group for cpumask */
rc = nvdimm_pmu_cpu_hotplug_init(nd_pmu);
if (rc) {
pr_info("cpu hotplug feature failed for device: %s\n", nd_pmu->pmu.name);
kfree(nd_pmu->pmu.attr_groups);
return rc;
}
rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->pmu.name, -1);
if (rc) {
kfree(nd_pmu->pmu.attr_groups);
nvdimm_pmu_free_hotplug_memory(nd_pmu);
return rc;
}
pr_info("%s NVDIMM performance monitor support registered\n",
nd_pmu->pmu.name);
return 0;
}
EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
{
perf_pmu_unregister(&nd_pmu->pmu);
nvdimm_pmu_free_hotplug_memory(nd_pmu);
kfree(nd_pmu);
}
EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
......@@ -15,6 +15,10 @@ static int nd_region_probe(struct device *dev)
static unsigned long once;
struct nd_region_data *ndrd;
struct nd_region *nd_region = to_nd_region(dev);
struct range range = {
.start = nd_region->ndr_start,
.end = nd_region->ndr_start + nd_region->ndr_size - 1,
};
if (nd_region->num_lanes > num_online_cpus()
&& nd_region->num_lanes < num_possible_cpus()
......@@ -30,25 +34,13 @@ static int nd_region_probe(struct device *dev)
if (rc)
return rc;
rc = nd_blk_region_init(nd_region);
if (rc)
return rc;
if (is_memory(&nd_region->dev)) {
struct range range = {
.start = nd_region->ndr_start,
.end = nd_region->ndr_start + nd_region->ndr_size - 1,
};
if (devm_init_badblocks(dev, &nd_region->bb))
return -ENODEV;
nd_region->bb_state = sysfs_get_dirent(nd_region->dev.kobj.sd,
"badblocks");
if (!nd_region->bb_state)
dev_warn(&nd_region->dev,
"'badblocks' notification disabled\n");
nvdimm_badblocks_populate(nd_region, &nd_region->bb, &range);
}
if (devm_init_badblocks(dev, &nd_region->bb))
return -ENODEV;
nd_region->bb_state =
sysfs_get_dirent(nd_region->dev.kobj.sd, "badblocks");
if (!nd_region->bb_state)
dev_warn(dev, "'badblocks' notification disabled\n");
nvdimm_badblocks_populate(nd_region, &nd_region->bb, &range);
rc = nd_region_register_namespaces(nd_region, &err);
if (rc < 0)
......@@ -158,4 +150,3 @@ void nd_region_exit(void)
}
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_REGION_PMEM);
MODULE_ALIAS_ND_DEVICE(ND_DEVICE_REGION_BLK);
......@@ -134,10 +134,7 @@ static void nd_region_release(struct device *dev)
}
free_percpu(nd_region->lane);
memregion_free(nd_region->id);
if (is_nd_blk(dev))
kfree(to_nd_blk_region(dev));
else
kfree(nd_region);
kfree(nd_region);
}
struct nd_region *to_nd_region(struct device *dev)
......@@ -157,33 +154,12 @@ struct device *nd_region_dev(struct nd_region *nd_region)
}
EXPORT_SYMBOL_GPL(nd_region_dev);
struct nd_blk_region *to_nd_blk_region(struct device *dev)
{
struct nd_region *nd_region = to_nd_region(dev);
WARN_ON(!is_nd_blk(dev));
return container_of(nd_region, struct nd_blk_region, nd_region);
}
EXPORT_SYMBOL_GPL(to_nd_blk_region);
void *nd_region_provider_data(struct nd_region *nd_region)
{
return nd_region->provider_data;
}
EXPORT_SYMBOL_GPL(nd_region_provider_data);
void *nd_blk_region_provider_data(struct nd_blk_region *ndbr)
{
return ndbr->blk_provider_data;
}
EXPORT_SYMBOL_GPL(nd_blk_region_provider_data);
void nd_blk_region_set_provider_data(struct nd_blk_region *ndbr, void *data)
{
ndbr->blk_provider_data = data;
}
EXPORT_SYMBOL_GPL(nd_blk_region_set_provider_data);
/**
* nd_region_to_nstype() - region to an integer namespace type
* @nd_region: region-device to interrogate
......@@ -208,8 +184,6 @@ int nd_region_to_nstype(struct nd_region *nd_region)
return ND_DEVICE_NAMESPACE_PMEM;
else
return ND_DEVICE_NAMESPACE_IO;
} else if (is_nd_blk(&nd_region->dev)) {
return ND_DEVICE_NAMESPACE_BLK;
}
return 0;
......@@ -332,14 +306,12 @@ static DEVICE_ATTR_RO(set_cookie);
resource_size_t nd_region_available_dpa(struct nd_region *nd_region)
{
resource_size_t blk_max_overlap = 0, available, overlap;
resource_size_t available;
int i;
WARN_ON(!is_nvdimm_bus_locked(&nd_region->dev));
retry:
available = 0;
overlap = blk_max_overlap;
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
......@@ -348,15 +320,7 @@ resource_size_t nd_region_available_dpa(struct nd_region *nd_region)
if (!ndd)
return 0;
if (is_memory(&nd_region->dev)) {
available += nd_pmem_available_dpa(nd_region,
nd_mapping, &overlap);
if (overlap > blk_max_overlap) {
blk_max_overlap = overlap;
goto retry;
}
} else if (is_nd_blk(&nd_region->dev))
available += nd_blk_available_dpa(nd_region);
available += nd_pmem_available_dpa(nd_region, nd_mapping);
}
return available;
......@@ -364,26 +328,17 @@ resource_size_t nd_region_available_dpa(struct nd_region *nd_region)
resource_size_t nd_region_allocatable_dpa(struct nd_region *nd_region)
{
resource_size_t available = 0;
resource_size_t avail = 0;
int i;
if (is_memory(&nd_region->dev))
available = PHYS_ADDR_MAX;
WARN_ON(!is_nvdimm_bus_locked(&nd_region->dev));
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
if (is_memory(&nd_region->dev))
available = min(available,
nd_pmem_max_contiguous_dpa(nd_region,
nd_mapping));
else if (is_nd_blk(&nd_region->dev))
available += nd_blk_available_dpa(nd_region);
avail = min_not_zero(avail, nd_pmem_max_contiguous_dpa(
nd_region, nd_mapping));
}
if (is_memory(&nd_region->dev))
return available * nd_region->ndr_mappings;
return available;
return avail * nd_region->ndr_mappings;
}
static ssize_t available_size_show(struct device *dev,
......@@ -693,9 +648,8 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n)
&& a != &dev_attr_available_size.attr)
return a->mode;
if ((type == ND_DEVICE_NAMESPACE_PMEM
|| type == ND_DEVICE_NAMESPACE_BLK)
&& a == &dev_attr_available_size.attr)
if (type == ND_DEVICE_NAMESPACE_PMEM &&
a == &dev_attr_available_size.attr)
return a->mode;
else if (is_memory(dev) && nd_set)
return a->mode;
......@@ -828,12 +782,6 @@ static const struct attribute_group *nd_region_attribute_groups[] = {
NULL,
};
static const struct device_type nd_blk_device_type = {
.name = "nd_blk",
.release = nd_region_release,
.groups = nd_region_attribute_groups,
};
static const struct device_type nd_pmem_device_type = {
.name = "nd_pmem",
.release = nd_region_release,
......@@ -851,11 +799,6 @@ bool is_nd_pmem(struct device *dev)
return dev ? dev->type == &nd_pmem_device_type : false;
}
bool is_nd_blk(struct device *dev)
{
return dev ? dev->type == &nd_blk_device_type : false;
}
bool is_nd_volatile(struct device *dev)
{
return dev ? dev->type == &nd_volatile_device_type : false;
......@@ -929,22 +872,6 @@ void nd_region_advance_seeds(struct nd_region *nd_region, struct device *dev)
nvdimm_bus_unlock(dev);
}
int nd_blk_region_init(struct nd_region *nd_region)
{
struct device *dev = &nd_region->dev;
struct nvdimm_bus *nvdimm_bus = walk_to_nvdimm_bus(dev);
if (!is_nd_blk(dev))
return 0;
if (nd_region->ndr_mappings < 1) {
dev_dbg(dev, "invalid BLK region\n");
return -ENXIO;
}
return to_nd_blk_region(dev)->enable(nvdimm_bus, dev);
}
/**
* nd_region_acquire_lane - allocate and lock a lane
* @nd_region: region id and number of lanes possible
......@@ -1007,23 +934,12 @@ EXPORT_SYMBOL(nd_region_release_lane);
static unsigned long default_align(struct nd_region *nd_region)
{
unsigned long align;
int i, mappings;
u32 remainder;
int mappings;
if (is_nd_blk(&nd_region->dev))
align = MEMREMAP_COMPAT_ALIGN_MAX;
if (nd_region->ndr_size < MEMREMAP_COMPAT_ALIGN_MAX)
align = PAGE_SIZE;
else
align = MEMREMAP_COMPAT_ALIGN_MAX;
for (i = 0; i < nd_region->ndr_mappings; i++) {
struct nd_mapping *nd_mapping = &nd_region->mapping[i];
struct nvdimm *nvdimm = nd_mapping->nvdimm;
if (test_bit(NDD_ALIASING, &nvdimm->flags)) {
align = MEMREMAP_COMPAT_ALIGN_MAX;
break;
}
}
mappings = max_t(u16, 1, nd_region->ndr_mappings);
div_u64_rem(align, mappings, &remainder);
......@@ -1039,7 +955,6 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
{
struct nd_region *nd_region;
struct device *dev;
void *region_buf;
unsigned int i;
int ro = 0;
......@@ -1057,36 +972,13 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
if (test_bit(NDD_UNARMED, &nvdimm->flags))
ro = 1;
if (test_bit(NDD_NOBLK, &nvdimm->flags)
&& dev_type == &nd_blk_device_type) {
dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not BLK capable\n",
caller, dev_name(&nvdimm->dev), i);
return NULL;
}
}
if (dev_type == &nd_blk_device_type) {
struct nd_blk_region_desc *ndbr_desc;
struct nd_blk_region *ndbr;
ndbr_desc = to_blk_region_desc(ndr_desc);
ndbr = kzalloc(sizeof(*ndbr) + sizeof(struct nd_mapping)
* ndr_desc->num_mappings,
GFP_KERNEL);
if (ndbr) {
nd_region = &ndbr->nd_region;
ndbr->enable = ndbr_desc->enable;
ndbr->do_io = ndbr_desc->do_io;
}
region_buf = ndbr;
} else {
nd_region = kzalloc(struct_size(nd_region, mapping,
ndr_desc->num_mappings),
GFP_KERNEL);
region_buf = nd_region;
}
nd_region =
kzalloc(struct_size(nd_region, mapping, ndr_desc->num_mappings),
GFP_KERNEL);
if (!region_buf)
if (!nd_region)
return NULL;
nd_region->id = memregion_alloc(GFP_KERNEL);
if (nd_region->id < 0)
......@@ -1150,7 +1042,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
err_percpu:
memregion_free(nd_region->id);
err_id:
kfree(region_buf);
kfree(nd_region);
return NULL;
}
......@@ -1163,17 +1055,6 @@ struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
}
EXPORT_SYMBOL_GPL(nvdimm_pmem_region_create);
struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc)
{
if (ndr_desc->num_mappings > 1)
return NULL;
ndr_desc->num_lanes = min(ndr_desc->num_lanes, ND_MAX_LANES);
return nd_region_create(nvdimm_bus, ndr_desc, &nd_blk_device_type,
__func__);
}
EXPORT_SYMBOL_GPL(nvdimm_blk_region_create);
struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc)
{
......@@ -1198,7 +1079,7 @@ int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
}
/**
* nvdimm_flush - flush any posted write queues between the cpu and pmem media
* @nd_region: blk or interleaved pmem region
* @nd_region: interleaved pmem region
*/
int generic_nvdimm_flush(struct nd_region *nd_region)
{
......@@ -1231,7 +1112,7 @@ EXPORT_SYMBOL_GPL(nvdimm_flush);
/**
* nvdimm_has_flush - determine write flushing requirements
* @nd_region: blk or interleaved pmem region
* @nd_region: interleaved pmem region
*
* Returns 1 if writes require flushing
* Returns 0 if writes do not require flushing
......
......@@ -25,8 +25,6 @@ struct badrange {
};
enum {
/* when a dimm supports both PMEM and BLK access a label is required */
NDD_ALIASING = 0,
/* unarmed memory devices may not persist writes */
NDD_UNARMED = 1,
/* locked memory devices should not be accessed */
......@@ -35,8 +33,6 @@ enum {
NDD_SECURITY_OVERWRITE = 3,
/* tracking whether or not there is a pending device reference */
NDD_WORK_PENDING = 4,
/* ignore / filter NSLABEL_FLAG_LOCAL for this DIMM, i.e. no aliasing */
NDD_NOBLK = 5,
/* dimm supports namespace labels */
NDD_LABELING = 6,
......@@ -140,21 +136,6 @@ static inline void __iomem *devm_nvdimm_ioremap(struct device *dev,
}
struct nvdimm_bus;
struct module;
struct nd_blk_region;
struct nd_blk_region_desc {
int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
int (*do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw);
struct nd_region_desc ndr_desc;
};
static inline struct nd_blk_region_desc *to_blk_region_desc(
struct nd_region_desc *ndr_desc)
{
return container_of(ndr_desc, struct nd_blk_region_desc, ndr_desc);
}
/*
* Note that separate bits for locked + unlocked are defined so that
......@@ -257,7 +238,6 @@ struct nvdimm_bus *nvdimm_to_bus(struct nvdimm *nvdimm);
struct nvdimm *to_nvdimm(struct device *dev);
struct nd_region *to_nd_region(struct device *dev);
struct device *nd_region_dev(struct nd_region *nd_region);
struct nd_blk_region *to_nd_blk_region(struct device *dev);
struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
const char *nvdimm_name(struct nvdimm *nvdimm);
......@@ -295,10 +275,6 @@ struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
struct nd_region_desc *ndr_desc);
void *nd_region_provider_data(struct nd_region *nd_region);
void *nd_blk_region_provider_data(struct nd_blk_region *ndbr);
void nd_blk_region_set_provider_data(struct nd_blk_region *ndbr, void *data);
struct nvdimm *nd_blk_region_to_dimm(struct nd_blk_region *ndbr);
unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
u64 nd_fletcher64(void *addr, size_t len, bool le);
......
......@@ -8,6 +8,7 @@
#include <linux/ndctl.h>
#include <linux/device.h>
#include <linux/badblocks.h>
#include <linux/perf_event.h>
enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
......@@ -23,6 +24,57 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
};
#define NVDIMM_EVENT_VAR(_id) event_attr_##_id
#define NVDIMM_EVENT_PTR(_id) (&event_attr_##_id.attr.attr)
#define NVDIMM_EVENT_ATTR(_name, _id) \
PMU_EVENT_ATTR(_name, NVDIMM_EVENT_VAR(_id), _id, \
nvdimm_events_sysfs_show)
/* Event attribute array index */
#define NVDIMM_PMU_FORMAT_ATTR 0
#define NVDIMM_PMU_EVENT_ATTR 1
#define NVDIMM_PMU_CPUMASK_ATTR 2
#define NVDIMM_PMU_NULL_ATTR 3
/**
* struct nvdimm_pmu - data structure for nvdimm perf driver
* @pmu: pmu data structure for nvdimm performance stats.
* @dev: nvdimm device pointer.
* @cpu: designated cpu for counter access.
* @node: node for cpu hotplug notifier link.
* @cpuhp_state: state for cpu hotplug notification.
* @arch_cpumask: cpumask to get designated cpu for counter access.
*/
struct nvdimm_pmu {
struct pmu pmu;
struct device *dev;
int cpu;
struct hlist_node node;
enum cpuhp_state cpuhp_state;
/* cpumask provided by arch/platform specific code */
struct cpumask arch_cpumask;
};
struct platform_device;
#ifdef CONFIG_PERF_EVENTS
extern ssize_t nvdimm_events_sysfs_show(struct device *dev,
struct device_attribute *attr,
char *page);
int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
#else
static inline int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev)
{
return -ENXIO;
}
static inline void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu) { }
#endif
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
......@@ -92,27 +144,6 @@ struct nd_namespace_pmem {
int id;
};
/**
* struct nd_namespace_blk - namespace for dimm-bounded persistent memory
* @alt_name: namespace name supplied in the dimm label
* @uuid: namespace name supplied in the dimm label
* @id: ida allocated id
* @lbasize: blk namespaces have a native sector size when btt not present
* @size: sum of all the resource ranges allocated to this namespace
* @num_resources: number of dpa extents to claim
* @res: discontiguous dpa extents for given dimm
*/
struct nd_namespace_blk {
struct nd_namespace_common common;
char *alt_name;
uuid_t *uuid;
int id;
unsigned long lbasize;
resource_size_t size;
int num_resources;
struct resource **res;
};
static inline struct nd_namespace_io *to_nd_namespace_io(const struct device *dev)
{
return container_of(dev, struct nd_namespace_io, common.dev);
......@@ -125,11 +156,6 @@ static inline struct nd_namespace_pmem *to_nd_namespace_pmem(const struct device
return container_of(nsio, struct nd_namespace_pmem, nsio);
}
static inline struct nd_namespace_blk *to_nd_namespace_blk(const struct device *dev)
{
return container_of(dev, struct nd_namespace_blk, common.dev);
}
/**
* nvdimm_read_bytes() - synchronously read bytes from an nvdimm namespace
* @ndns: device to read
......
......@@ -189,7 +189,6 @@ static inline const char *nvdimm_cmd_name(unsigned cmd)
#define ND_DEVICE_REGION_BLK 3 /* nd_region: (parent of BLK namespaces) */
#define ND_DEVICE_NAMESPACE_IO 4 /* legacy persistent memory */
#define ND_DEVICE_NAMESPACE_PMEM 5 /* PMEM namespace (may alias with BLK) */
#define ND_DEVICE_NAMESPACE_BLK 6 /* BLK namespace (may alias with PMEM) */
#define ND_DEVICE_DAX_PMEM 7 /* Device DAX interface to pmem */
enum nd_driver_flags {
......@@ -198,7 +197,6 @@ enum nd_driver_flags {
ND_DRIVER_REGION_BLK = 1 << ND_DEVICE_REGION_BLK,
ND_DRIVER_NAMESPACE_IO = 1 << ND_DEVICE_NAMESPACE_IO,
ND_DRIVER_NAMESPACE_PMEM = 1 << ND_DEVICE_NAMESPACE_PMEM,
ND_DRIVER_NAMESPACE_BLK = 1 << ND_DEVICE_NAMESPACE_BLK,
ND_DRIVER_DAX_PMEM = 1 << ND_DEVICE_DAX_PMEM,
};
......
......@@ -27,7 +27,6 @@ ccflags-y += -I$(srctree)/drivers/acpi/nfit/
obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
obj-$(CONFIG_ND_BTT) += nd_btt.o
obj-$(CONFIG_ND_BLK) += nd_blk.o
obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
obj-$(CONFIG_ACPI_NFIT) += nfit.o
ifeq ($(CONFIG_DAX),m)
......@@ -50,9 +49,6 @@ nd_pmem-y += config_check.o
nd_btt-y := $(NVDIMM_SRC)/btt.o
nd_btt-y += config_check.o
nd_blk-y := $(NVDIMM_SRC)/blk.o
nd_blk-y += config_check.o
nd_e820-y := $(NVDIMM_SRC)/e820.o
nd_e820-y += config_check.o
......
......@@ -11,7 +11,6 @@ void check(void)
BUILD_BUG_ON(!IS_MODULE(CONFIG_BLK_DEV_PMEM));
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BTT));
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_PFN));
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BLK));
if (IS_ENABLED(CONFIG_ACPI_NFIT))
BUILD_BUG_ON(!IS_MODULE(CONFIG_ACPI_NFIT));
BUILD_BUG_ON(!IS_MODULE(CONFIG_DEV_DAX));
......
......@@ -338,62 +338,6 @@ static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
static int ndtest_blk_do_io(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw)
{
struct ndtest_dimm *dimm = ndbr->blk_provider_data;
struct ndtest_blk_mmio *mmio = dimm->mmio;
struct nd_region *nd_region = &ndbr->nd_region;
unsigned int lane;
if (!mmio)
return -ENOMEM;
lane = nd_region_acquire_lane(nd_region);
if (rw)
memcpy(mmio->base + dpa, iobuf, len);
else {
memcpy(iobuf, mmio->base + dpa, len);
arch_invalidate_pmem(mmio->base + dpa, len);
}
nd_region_release_lane(nd_region, lane);
return 0;
}
static int ndtest_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
struct device *dev)
{
struct nd_blk_region *ndbr = to_nd_blk_region(dev);
struct nvdimm *nvdimm;
struct ndtest_dimm *dimm;
struct ndtest_blk_mmio *mmio;
nvdimm = nd_blk_region_to_dimm(ndbr);
dimm = nvdimm_provider_data(nvdimm);
nd_blk_region_set_provider_data(ndbr, dimm);
dimm->blk_region = to_nd_region(dev);
mmio = devm_kzalloc(dev, sizeof(struct ndtest_blk_mmio), GFP_KERNEL);
if (!mmio)
return -ENOMEM;
mmio->base = (void __iomem *) devm_nvdimm_memremap(
dev, dimm->address, 12, nd_blk_memremap_flags(ndbr));
if (!mmio->base) {
dev_err(dev, "%s failed to map blk dimm\n", nvdimm_name(nvdimm));
return -ENOMEM;
}
mmio->size = dimm->size;
mmio->base_offset = 0;
dimm->mmio = mmio;
return 0;
}
static struct nfit_test_resource *ndtest_resource_lookup(resource_size_t addr)
{
int i;
......@@ -523,17 +467,16 @@ static int ndtest_create_region(struct ndtest_priv *p,
struct ndtest_region *region)
{
struct nd_mapping_desc mappings[NDTEST_MAX_MAPPING];
struct nd_blk_region_desc ndbr_desc;
struct nd_region_desc *ndr_desc, _ndr_desc;
struct nd_interleave_set *nd_set;
struct nd_region_desc *ndr_desc;
struct resource res;
int i, ndimm = region->mapping[0].dimm;
u64 uuid[2];
memset(&res, 0, sizeof(res));
memset(&mappings, 0, sizeof(mappings));
memset(&ndbr_desc, 0, sizeof(ndbr_desc));
ndr_desc = &ndbr_desc.ndr_desc;
memset(&_ndr_desc, 0, sizeof(_ndr_desc));
ndr_desc = &_ndr_desc;
if (!ndtest_alloc_resource(p, region->size, &res.start))
return -ENOMEM;
......@@ -857,10 +800,8 @@ static int ndtest_dimm_register(struct ndtest_priv *priv,
struct device *dev = &priv->pdev.dev;
unsigned long dimm_flags = dimm->flags;
if (dimm->num_formats > 1) {
set_bit(NDD_ALIASING, &dimm_flags);
if (dimm->num_formats > 1)
set_bit(NDD_LABELING, &dimm_flags);
}
if (dimm->flags & PAPR_PMEM_UNARMED_MASK)
set_bit(NDD_UNARMED, &dimm_flags);
......
......@@ -2842,28 +2842,6 @@ static void nfit_test1_setup(struct nfit_test *t)
set_bit(ND_CMD_SET_CONFIG_DATA, &acpi_desc->dimm_cmd_force_en);
}
static int nfit_test_blk_do_io(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw)
{
struct nfit_blk *nfit_blk = ndbr->blk_provider_data;
struct nfit_blk_mmio *mmio = &nfit_blk->mmio[BDW];
struct nd_region *nd_region = &ndbr->nd_region;
unsigned int lane;
lane = nd_region_acquire_lane(nd_region);
if (rw)
memcpy(mmio->addr.base + dpa, iobuf, len);
else {
memcpy(iobuf, mmio->addr.base + dpa, len);
/* give us some some coverage of the arch_invalidate_pmem() API */
arch_invalidate_pmem(mmio->addr.base + dpa, len);
}
nd_region_release_lane(nd_region, lane);
return 0;
}
static unsigned long nfit_ctl_handle;
union acpi_object *result;
......@@ -3219,7 +3197,6 @@ static int nfit_test_probe(struct platform_device *pdev)
nfit_test->setup(nfit_test);
acpi_desc = &nfit_test->acpi_desc;
acpi_nfit_desc_init(acpi_desc, &pdev->dev);
acpi_desc->blk_do_io = nfit_test_blk_do_io;
nd_desc = &acpi_desc->nd_desc;
nd_desc->provider_name = NULL;
nd_desc->module = THIS_MODULE;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment