• Linus Torvalds's avatar
    Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · f67e3fb4
    Linus Torvalds authored
    Pull device-dax updates from Dan Williams:
     "New device-dax infrastructure to allow persistent memory and other
      "reserved" / performance differentiated memories, to be assigned to
      the core-mm as "System RAM".
    
      Some users want to use persistent memory as additional volatile
      memory. They are willing to cope with potential performance
      differences, for example between DRAM and 3D Xpoint, and want to use
      typical Linux memory management apis rather than a userspace memory
      allocator layered over an mmap() of a dax file. The administration
      model is to decide how much Persistent Memory (pmem) to use as System
      RAM, create a device-dax-mode namespace of that size, and then assign
      it to the core-mm. The rationale for device-dax is that it is a
      generic memory-mapping driver that can be layered over any "special
      purpose" memory, not just pmem. On subsequent boots udev rules can be
      used to restore the memory assignment.
    
      One implication of using pmem as RAM is that mlock() no longer keeps
      data off persistent media. For this reason it is recommended to enable
      NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
      at rest. We considered making this recommendation an actively enforced
      requirement, but in the end decided to leave it as a distribution /
      administrator policy to allow for emulation and test environments that
      lack security capable NVDIMMs.
    
      Summary:
    
       - Replace the /sys/class/dax device model with /sys/bus/dax, and
         include a compat driver so distributions can opt-in to the new ABI.
    
       - Allow for an alternative driver for the device-dax address-range
    
       - Introduce the 'kmem' driver to hotplug / assign a device-dax
         address-range to the core-mm.
    
       - Arrange for the device-dax target-node to be onlined so that the
         newly added memory range can be uniquely referenced by numa apis"
    
    NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
    we currently have special - and very annoying rules in the kernel about
    accessing PMEM only with the "MC safe" accessors, because machine checks
    inside the regular repeat string copy functions can be fatal in some
    (not described) circumstances.
    
    And apparently the PMEM modules can cause that a lot more than regular
    RAM.  The argument is that this happens because PMEM doesn't necessarily
    get scrubbed at boot like RAM does, but that is planned to be added for
    the user space tooling.
    
    Quoting Dan from another email:
     "The exposure can be reduced in the volatile-RAM case by scanning for
      and clearing errors before it is onlined as RAM. The userspace tooling
      for that can be in place before v5.1-final. There's also runtime
      notifications of errors via acpi_nfit_uc_error_notify() from
      background scrubbers on the DIMM devices. With that mechanism the
      kernel could proactively clear newly discovered poison in the volatile
      case, but that would be additional development more suitable for v5.2.
    
      I understand the concern, and the need to highlight this issue by
      tapping the brakes on feature development, but I don't see PMEM as RAM
      making the situation worse when the exposure is also there via DAX in
      the PMEM case. Volatile-RAM is arguably a safer use case since it's
      possible to repair pages where the persistent case needs active
      application coordination"
    
    * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
      device-dax: "Hotplug" persistent memory for use like normal RAM
      mm/resource: Let walk_system_ram_range() search child resources
      mm/memory-hotplug: Allow memory resources to be children
      mm/resource: Move HMM pr_debug() deeper into resource code
      mm/resource: Return real error codes from walk failures
      device-dax: Add a 'modalias' attribute to DAX 'bus' devices
      device-dax: Add a 'target_node' attribute
      device-dax: Auto-bind device after successful new_id
      acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
      device-dax: Add /sys/class/dax backwards compatibility
      device-dax: Add support for a dax override driver
      device-dax: Move resource pinning+mapping into the common driver
      device-dax: Introduce bus + driver model
      device-dax: Start defining a dax bus model
      device-dax: Remove multi-resource infrastructure
      device-dax: Kill dax_region base
      device-dax: Kill dax_region ida
    f67e3fb4
numa.c 12.8 KB