1. 19 Jul, 2022 22 commits
    • Heiko Carstens's avatar
      e2f39c9f
    • Heiko Carstens's avatar
      s390/cpufeature: rework to allow more than only hwcap bits · 0a5f9b38
      Heiko Carstens authored
      Rework cpufeature implementation to allow for various cpu feature
      indications, which is not only limited to hwcap bits. This is achieved
      by adding a sequential list of cpu feature numbers, where each of them
      is mapped to an entry which indicates what this number is about.
      
      Each entry contains a type member, which indicates what feature
      name space to look into (e.g. hwcap, or cpu facility). If wanted this
      allows also to automatically load modules only in e.g. z/VM
      configurations.
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarSteffen Eiden <seiden@linux.ibm.com>
      Reviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220713125644.16121-2-seiden@linux.ibm.comSigned-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      0a5f9b38
    • Tony Krowiak's avatar
      MAINTAINERS: pick up all vfio_ap docs for VFIO AP maintainers · 693714b9
      Tony Krowiak authored
      A new document, Documentation/s390/vfio-ap-locking.rst was added. Make sure
      the new document is picked up for the VFIO AP maintainers by using a
      wildcard: Documentation/s390/vfio-ap*.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      693714b9
    • Tony Krowiak's avatar
      s390/Docs: new doc describing lock usage by the vfio_ap device driver · e32d3827
      Tony Krowiak authored
      Introduces a new document describing the locks used by the vfio_ap device
      driver and how to use them so as to avoid lockdep reports and deadlock
      situations.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      e32d3827
    • Tony Krowiak's avatar
      s390/vfio-ap: update docs to include dynamic config support · cb269e0a
      Tony Krowiak authored
      Update the documentation in vfio-ap.rst to include information about the
      AP dynamic configuration support (e.g., hot plug of adapters, domains
      and control domains via the matrix mediated device's sysfs assignment
      attributes). This patch also makes a few minor tweaks to make corrections
      and clarifications.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      cb269e0a
    • Tony Krowiak's avatar
      s390/vfio-ap: handle config changed and scan complete notification · eeb386ae
      Tony Krowiak authored
      This patch implements two new AP driver callbacks:
      
      void (*on_config_changed)(struct ap_config_info *new_config_info,
                        struct ap_config_info *old_config_info);
      
      void (*on_scan_complete)(struct ap_config_info *new_config_info,
                       struct ap_config_info *old_config_info);
      
      The on_config_changed callback is invoked at the start of the AP bus scan
      function when it determines that the host AP configuration information
      has changed since the previous scan.
      
      The vfio_ap device driver registers a callback function for this callback
      that performs the following operations:
      
      1. Unplugs the adapters, domains and control domains removed from the
      host's AP configuration from the guests to which they are
      assigned in a single operation.
      
      2. Stores bitmaps identifying the adapters, domains and control domains
      added to the host's AP configuration with the structure representing
      the mediated device. When the vfio_ap device driver's probe callback is
      subsequently invoked, the probe function will recognize that the
      queue is being probed due to a change in the host's AP configuration
      and the plugging of the queue into the guest will be bypassed.
      
      The on_scan_complete callback is invoked after the ap bus scan is
      completed if the host AP configuration data has changed. The vfio_ap
      device driver registers a callback function for this callback that hot
      plugs each queue and control domain added to the AP configuration for each
      guest using them in a single hot plug operation.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      eeb386ae
    • Tony Krowiak's avatar
      s390/vfio-ap: sysfs attribute to display the guest's matrix · f7f795c5
      Tony Krowiak authored
      The matrix of adapters and domains configured in a guest's APCB may
      differ from the matrix of adapters and domains assigned to the matrix mdev,
      so this patch introduces a sysfs attribute to display the matrix of
      adapters and domains that are or will be assigned to the APCB of a guest
      that is or will be using the matrix mdev. For a matrix mdev denoted by
      $uuid, the guest matrix can be displayed as follows:
      
         cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      f7f795c5
    • Tony Krowiak's avatar
      s390/vfio-ap: implement in-use callback for vfio_ap driver · 3f85d1df
      Tony Krowiak authored
      Let's implement the callback to indicate when an APQN
      is in use by the vfio_ap device driver. The callback is
      invoked whenever a change to the apmask or aqmask would
      result in one or more queue devices being removed from the driver. The
      vfio_ap device driver will indicate a resource is in use
      if the APQN of any of the queue devices to be removed are assigned to
      any of the matrix mdevs under the driver's control.
      
      There is potential for a deadlock condition between the
      matrix_dev->guests_lock used to lock the guest during assignment of
      adapters and domains and the ap_perms_mutex locked by the AP bus when
      changes are made to the sysfs apmask/aqmask attributes.
      
      The AP Perms lock controls access to the objects that store the adapter
      numbers (ap_perms) and domain numbers (aq_perms) for the sysfs
      /sys/bus/ap/apmask and /sys/bus/ap/aqmask attributes. These attributes
      identify which queues are reserved for the zcrypt default device drivers.
      Before allowing a bit to be removed from either mask, the AP bus must check
      with the vfio_ap device driver to verify that none of the queues are
      assigned to any of its mediated devices.
      
      The apmask/aqmask attributes can be written or read at any time from
      userspace, so care must be taken to prevent a deadlock with asynchronous
      operations that might be taking place in the vfio_ap device driver. For
      example, consider the following:
      
      1. A system administrator assigns an adapter to a mediated device under the
         control of the vfio_ap device driver. The driver will need to first take
         the matrix_dev->guests_lock to potentially hot plug the adapter into
         the KVM guest.
      2. At the same time, a system administrator sets a bit in the sysfs
         /sys/bus/ap/ap_mask attribute. To complete the operation, the AP bus
         must:
         a. Take the ap_perms_mutex lock to update the object storing the values
            for the /sys/bus/ap/ap_mask attribute.
         b. Call the vfio_ap device driver's in-use callback to verify that the
            queues now being reserved for the default zcrypt drivers are not
            assigned to a mediated device owned by the vfio_ap device driver. To
            do the verification, the in-use callback function takes the
            matrix_dev->guests_lock, but has to wait because it is already held
            by the operation in 1 above.
      3. The vfio_ap device driver calls an AP bus function to verify that the
         new queues resulting from the assignment of the adapter in step 1 are
         not reserved for the default zcrypt device driver. This AP bus function
         tries to take the ap_perms_mutex lock but gets stuck waiting for the
         waiting for the lock due to step 2a above.
      
      Consequently, we have the following deadlock situation:
      
      matrix_dev->guests_lock locked (1)
      ap_perms_mutex lock locked (2a)
      Waiting for matrix_dev->gusts_lock (2b) which is currently held (1)
      Waiting for ap_perms_mutex lock (3) which is currently held (2a)
      
      To prevent this deadlock scenario, the function called in step 3 will no
      longer take the ap_perms_mutex lock and require the caller to take the
      lock. The lock will be the first taken by the adapter/domain assignment
      functions in the vfio_ap device driver to maintain the proper locking
      order.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      3f85d1df
    • Tony Krowiak's avatar
      s390/vfio-ap: reset queues after adapter/domain unassignment · 70aeefe5
      Tony Krowiak authored
      When an adapter or domain is unassigned from an mdev attached to a KVM
      guest, one or more of the guest's queues may get dynamically removed. Since
      the removed queues could get re-assigned to another mdev, they need to be
      reset. So, when an adapter or domain is unassigned from the mdev, the
      queues that are removed from the guest's AP configuration (APCB) will be
      reset.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      70aeefe5
    • Tony Krowiak's avatar
      s390/vfio-ap: hot plug/unplug of AP devices when probed/removed · 09d31ff7
      Tony Krowiak authored
      When an AP queue device is probed or removed, if the mediated device is
      attached to a KVM guest, the mediated device's adapter, domain and
      control domain bitmaps must be filtered to update the guest's APCB and if
      any changes are detected, the guest's APCB must then be hot plugged into
      the guest to reflect those changes to the guest.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      09d31ff7
    • Tony Krowiak's avatar
      s390/vfio-ap: allow hot plug/unplug of AP devices when assigned/unassigned · 51dc562a
      Tony Krowiak authored
      Let's hot plug an adapter, domain or control domain into the guest when it
      is assigned to a matrix mdev that is attached to a KVM guest. Likewise,
      let's hot unplug an adapter, domain or control domain from the guest when
      it is unassigned from a matrix_mdev that is attached to a KVM guest.
      
      Whenever an assignment or unassignment of an adapter, domain or control
      domain is performed, the APQNs and control domains assigned to the matrix
      mdev will be filtered and assigned to the AP control block
      (APCB) that supplies the AP configuration to the guest so that no
      adapter, domain or control domain that is not in the host's AP
      configuration nor any APQN that does not reference a queue device bound
      to the vfio_ap device driver is assigned.
      
      After updating the APCB, if the mdev is in use by a KVM guest, it is
      hot plugged into the guest to dynamically provide access to the adapters,
      domains and control domains provided via the newly refreshed APCB.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      51dc562a
    • Tony Krowiak's avatar
      s390/vfio-ap: prepare for dynamic update of guest's APCB on queue probe/remove · 2c1ee898
      Tony Krowiak authored
      The callback functions for probing and removing a queue device must take
      and release the locks required to perform a dynamic update of a guest's
      APCB in the proper order.
      
      The proper order for taking the locks is:
      
              matrix_dev->guests_lock => kvm->lock => matrix_dev->mdevs_lock
      
      The proper order for releasing the locks is:
      
              matrix_dev->mdevs_lock => kvm->lock => matrix_dev->guests_lock
      
      A new helper function is introduced to be used by the probe callback to
      acquire the required locks. Since the probe callback only has
      access to a queue device when it is called, the helper function will find
      the ap_matrix_mdev object to which the queue device's APQN is assigned and
      return it so the KVM guest to which the mdev is attached can be dynamically
      updated.
      
      Note that in order to find the ap_matrix_mdev (matrix_mdev) object, it is
      necessary to search the matrix_dev->mdev_list. This presents a
      locking order dilemma because the matrix_dev->mdevs_lock can't be taken to
      protect against changes to the list while searching for the matrix_mdev to
      which a queue device's APQN is assigned. This is due to the fact that the
      proper locking order requires that the matrix_dev->mdevs_lock be taken
      after both the matrix_mdev->kvm->lock and the matrix_dev->mdevs_lock.
      Consequently, the matrix_dev->guests_lock will be used to protect against
      removal of a matrix_mdev object from the list while a queue device is
      being probed. This necessitates changes to the mdev probe/remove
      callback functions to take the matrix_dev->guests_lock prior to removing
      a matrix_mdev object from the list.
      
      A new macro is also introduced to acquire the locks required to dynamically
      update the guest's APCB in the proper order when a queue device is
      removed.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      2c1ee898
    • Tony Krowiak's avatar
      s390/vfio-ap: prepare for dynamic update of guest's APCB on assign/unassign · 8ee13ad9
      Tony Krowiak authored
      The functions backing the matrix mdev's sysfs attribute interfaces to
      assign/unassign adapters, domains and control domains must take and
      release the locks required to perform a dynamic update of a guest's APCB
      in the proper order.
      
      The proper order for taking the locks is:
      
      matrix_dev->guests_lock => kvm->lock => matrix_dev->mdevs_lock
      
      The proper order for releasing the locks is:
      
      matrix_dev->mdevs_lock => kvm->lock => matrix_dev->guests_lock
      
      Two new macros are introduced for this purpose: One to take the locks and
      the other to release the locks. These macros will be used by the
      assignment/unassignment functions to prepare for dynamic update of
      the KVM guest's APCB.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      8ee13ad9
    • Tony Krowiak's avatar
      s390/vfio-ap: use proper locking order when setting/clearing KVM pointer · b84eb8e0
      Tony Krowiak authored
      The group notifier that handles the VFIO_GROUP_NOTIFY_SET_KVM event must
      use the required locks in proper locking order to dynamically update the
      guest's APCB. The proper locking order is:
      
             1. matrix_dev->guests_lock: required to use the KVM pointer to
                update a KVM guest's APCB.
      
             2. matrix_mdev->kvm->lock: required to update a KVM guest's APCB.
      
             3. matrix_dev->mdevs_lock: required to store or access the data
                stored in a struct ap_matrix_mdev instance.
      
      Two macros are introduced to acquire and release the locks in the proper
      order. These macros are now used by the group notifier functions.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      b84eb8e0
    • Tony Krowiak's avatar
      s390/vfio-ap: introduce new mutex to control access to the KVM pointer · 21195eb0
      Tony Krowiak authored
      The vfio_ap device driver registers for notification when the pointer to
      the KVM object for a guest is set. Recall that the KVM lock (kvm->lock)
      mutex must be taken outside of the matrix_dev->lock mutex to prevent the
      reporting by lockdep of a circular locking dependency (a.k.a., a lockdep
      splat):
      
      * see commit 0cc00c8d ("Fix circular lockdep when setting/clearing
        crypto masks")
      
      * see commit 86956e70 ("replace open coded locks for
        VFIO_GROUP_NOTIFY_SET_KVM notification")
      
      With the introduction of support for hot plugging/unplugging AP devices
      passed through to a KVM guest, a new guests_lock mutex is introduced to
      ensure the proper locking order is maintained:
      
      struct ap_matrix_dev {
              ...
              struct mutex guests_lock;
             ...
      }
      
      The matrix_dev->guests_lock controls access to the matrix_mdev instances
      that hold the state for AP devices that have been passed through to a
      KVM guest. This lock must be held to control access to the KVM pointer
      (matrix_mdev->kvm) while the vfio_ap device driver is using it to
      plug/unplug AP devices passed through to the KVM guest.
      
      Keep in mind, the proper locking order must be maintained whenever
      dynamically updating a KVM guest's APCB to plug/unplug adapters, domains
      and control domains:
      
          1. matrix_dev->guests_lock: required to use the KVM pointer - stored in
             a struct ap_matrix_mdev instance - to update a KVM guest's APCB
      
          2. matrix_mdev->kvm->lock: required to update a guest's APCB
      
          3. matrix_dev->mdevs_lock: required to access data stored in a
             struct ap_matrix_mdev instance.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      21195eb0
    • Tony Krowiak's avatar
      s390/vfio-ap: rename matrix_dev->lock mutex to matrix_dev->mdevs_lock · d0786556
      Tony Krowiak authored
      The matrix_dev->lock mutex is being renamed to matrix_dev->mdevs_lock to
      better reflect its purpose, which is to control access to the state of the
      mediated devices under the control of the vfio_ap device driver.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      d0786556
    • Tony Krowiak's avatar
      s390/vfio-ap: allow assignment of unavailable AP queues to mdev device · e2126a73
      Tony Krowiak authored
      The current implementation does not allow assignment of an AP adapter or
      domain to an mdev device if each APQN resulting from the assignment
      does not reference an AP queue device that is bound to the vfio_ap device
      driver. This patch allows assignment of AP resources to the matrix mdev as
      long as the APQNs resulting from the assignment:
         1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
         2. Are not assigned to another matrix mdev.
      
      The rationale behind this is that the AP architecture does not preclude
      assignment of APQNs to an AP configuration profile that are not available
      to the system.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      e2126a73
    • Tony Krowiak's avatar
      s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev · 48cae940
      Tony Krowiak authored
      Refresh the guest's APCB by filtering the APQNs and control domain numbers
      assigned to the matrix mdev.
      
      Filtering of APQNs:
      -----------------
      APQNs that do not reference an AP queue device bound to the vfio_ap device
      driver must be filtered from the APQNs assigned to the matrix mdev before
      they can be assigned to the guest's APCB. Given that the APQNs are
      configured in the guest's APCB as a matrix of APIDs (adapters) and APQIs
      (domains), it is not possible to filter an individual APQN. For example,
      suppose the matrix of APQNs is structured as follows:
      
                         APIDs
                   3      4      5
              0  (3,0)  (4,0)  (5,0)
      APQIs   1  (3,1)  (4,1)  (5,1)
              2  (3,2)  (4,2)  (5,2)
      
      Now suppose APQN (4,1) does not reference a queue device bound to the
      vfio_ap device driver. If we filter APID 4, the APQNs (4,0), (4,1) and
      (4,2) will be removed. Similarly, if we filter domain 1, APQNs (3,1),
      (4,1) and (5,1) will be removed.
      
      To resolve this dilemma, the choice was made to filter the APID - in this
      case 4 - from the guest's APCB. The reason for this design decision is
      because the APID references an AP adapter which is a real hardware device
      that can be physically installed, removed, enabled or disabled; whereas, a
      domain is a partition within the adapter. It therefore better reflects
      reality to remove the APID from the guest's APCB.
      
      Filtering of control domains:
      ----------------------------
      Any control domains that are not assigned to the host's AP configuration
      will be filtered from those assigned to the matrix mdev before assigning
      them to the guest's APCB.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      48cae940
    • Tony Krowiak's avatar
      s390/vfio-ap: introduce shadow APCB · 49b0109f
      Tony Krowiak authored
      The APCB is a field within the CRYCB that provides the AP configuration
      to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
      maintain it for the lifespan of the guest.
      
      The shadow APCB serves the following purposes:
      
      1. The shadow APCB can be maintained even when the mediated device is not
         currently in use by a KVM guest. Since the mediated device's AP
         configuration is filtered to ensure that no AP queues are passed through
         to the KVM guest that are not bound to the vfio_ap device driver or
         available to the host, the mediated device's AP configuration may differ
         from the guest's. Having a shadow of a guest's APCB allows us to provide
         a sysfs interface to view the guest's APCB even if the mediated device
         is not currently passed through to a KVM guest. This can aid in
         problem determination when the guest is unexpectedly missing AP
         resources.
      
      2. If filtering was done in-place for the real APCB, the guest could pick
         up a transient state. Doing the filtering on a shadow and transferring
         the AP configuration to the real APCB after the guest is started or when
         AP resources are assigned to or unassigned from the mediated device, or
         when the host configuration changes, the guest's AP configuration will
         never be in a transient state.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      49b0109f
    • Tony Krowiak's avatar
      s390/vfio-ap: manage link between queue struct and matrix mdev · 11cb2419
      Tony Krowiak authored
      Let's create links between each queue device bound to the vfio_ap device
      driver and the matrix mdev to which the queue's APQN is assigned. The idea
      is to facilitate efficient retrieval of the objects representing the queue
      devices and matrix mdevs as well as to verify that a queue assigned to
      a matrix mdev is bound to the driver.
      
      The links will be created as follows:
      
       * When the queue device is probed, if its APQN is assigned to a matrix
         mdev, the structures representing the queue device and the matrix mdev
         will be linked.
      
       * When an adapter or domain is assigned to a matrix mdev, for each new
         APQN assigned that references a queue device bound to the vfio_ap
         device driver, the structures representing the queue device and the
         matrix mdev will be linked.
      
      The links will be removed as follows:
      
       * When the queue device is removed, if its APQN is assigned to a matrix
         mdev, the link from the structure representing the matrix mdev to the
         structure representing the queue will be removed. Since the storage
         allocated for the vfio_ap_queue will be freed, there is no need to
         remove the link to the matrix_mdev to which the queue's APQN is
         assigned.
      
       * When an adapter or domain is unassigned from a matrix mdev, for each
         APQN unassigned that references a queue device bound to the vfio_ap
         device driver, the structures representing the queue device and the
         matrix mdev will be unlinked.
      
       * When an mdev is removed, the link from any queues assigned to the mdev
         to the mdev will be removed.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      11cb2419
    • Tony Krowiak's avatar
      s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c · 260f3ea1
      Tony Krowiak authored
      Let's move the probe and remove callbacks into the vfio_ap_ops.c
      file to keep all code related to managing queues in a single file. This
      way, all functions related to queue management can be removed from the
      vfio_ap_private.h header file defining the public interfaces for the
      vfio_ap device driver.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      260f3ea1
    • Tony Krowiak's avatar
      s390/vfio-ap: use new AP bus interface to search for queue devices · 034921cd
      Tony Krowiak authored
      This patch refactors the vfio_ap device driver to use the AP bus's
      ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
      information about a queue that is bound to the vfio_ap device driver.
      The bus's ap_get_qdev() function retrieves the queue device from a
      hashtable keyed by APQN. This is much more efficient than looping over
      the list of devices attached to the AP bus by several orders of
      magnitude.
      Signed-off-by: default avatarTony Krowiak <akrowiak@linux.ibm.com>
      Reviewed-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Reviewed-by: default avatarJason J. Herne <jjherne@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      034921cd
  2. 05 Jul, 2022 1 commit
  3. 19 Jun, 2022 11 commits
    • Linus Torvalds's avatar
      Linux 5.19-rc3 · a111daf0
      Linus Torvalds authored
      a111daf0
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 05c6ca85
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Make RESERVE_BRK() work again with older binutils. The recent
         'simplification' broke that.
      
       - Make early #VE handling increment RIP when successful.
      
       - Make the #VE code consistent vs. the RIP adjustments and add
         comments.
      
       - Handle load_unaligned_zeropad() across page boundaries correctly in
         #VE when the second page is shared.
      
      * tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
        x86/tdx: Clarify RIP adjustments in #VE handler
        x86/tdx: Fix early #VE handling
        x86/mm: Fix RESERVE_BRK() for older binutils
      05c6ca85
    • Linus Torvalds's avatar
      Merge tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5d770f11
      Linus Torvalds authored
      Pull build tooling updates from Thomas Gleixner:
      
       - Remove obsolete CONFIG_X86_SMAP reference from objtool
      
       - Fix overlapping text section failures in faddr2line for real
      
       - Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
         with finegrained annotations so objtool can validate that code
         correctly.
      
      * tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage
        faddr2line: Fix overlapping text section failures, the sequel
        objtool: Fix obsolete reference to CONFIG_X86_SMAP
      5d770f11
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 727c3991
      Linus Torvalds authored
      Pull scheduler fix from Thomas Gleixner:
       "A single scheduler fix plugging a race between sched_setscheduler()
        and balance_push().
      
        sched_setscheduler() spliced the balance callbacks accross a lock
        break which makes it possible for an interleaving schedule() to
        observe an empty list"
      
      * tag 'sched-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Fix balance_push() vs __sched_setscheduler()
      727c3991
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4afb6515
      Linus Torvalds authored
      Pull lockdep fix from Thomas Gleixner:
       "A RT fix for lockdep.
      
        lockdep invokes prandom_u32() to create cookies. This worked until
        prandom_u32() was switched to the real random generator, which takes a
        spinlock for extraction, which does not work on RT when invoked from
        atomic contexts.
      
        lockdep has no requirement for real random numbers and it turns out
        sched_clock() is good enough to create the cookie. That works
        everywhere and is faster"
      
      * tag 'locking-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/lockdep: Use sched_clock() for random numbers
      4afb6515
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 36da9f5f
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of interrupt subsystem updates:
      
        Core:
      
         - Ensure runtime power management for chained interrupts
      
        Drivers:
      
         - A collection of OF node refcount fixes
      
         - Unbreak MIPS uniprocessor builds
      
         - Fix xilinx interrupt controller Kconfig dependencies
      
         - Add a missing compatible string to the Uniphier driver"
      
      * tag 'irq-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/loongson-liointc: Use architecture register to get coreid
        irqchip/uniphier-aidet: Add compatible string for NX1 SoC
        dt-bindings: interrupt-controller/uniphier-aidet: Add bindings for NX1 SoC
        irqchip/realtek-rtl: Fix refcount leak in map_interrupts
        irqchip/gic-v3: Fix refcount leak in gic_populate_ppi_partitions
        irqchip/gic-v3: Fix error handling in gic_populate_ppi_partitions
        irqchip/apple-aic: Fix refcount leak in aic_of_ic_init
        irqchip/apple-aic: Fix refcount leak in build_fiq_affinity
        irqchip/gic/realview: Fix refcount leak in realview_gic_of_init
        irqchip/xilinx: Remove microblaze+zynq dependency
        genirq: PM: Use runtime PM for chained interrupts
      36da9f5f
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.19-rc3-take2' of... · bc94632c
      Linus Torvalds authored
      Merge tag 'char-misc-5.19-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
      
      Pull char/misc driver fixes for real from Greg KH:
       "Let's tag the proper branch this time...
      
        Here are some small char/misc driver fixes for 5.19-rc3 that resolve
        some reported issues.
      
        They include:
      
         - mei driver fixes
      
         - comedi driver fix
      
         - rtsx build warning fix
      
         - fsl-mc-bus driver fix
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      This is what the merge in commit f0ec9c65 _should_ have merged, but
      Greg fat-fingered the pull request and I got some small changes from
      linux-next instead there. Credit to Nathan Chancellor for eagle-eyes.
      
      Link: https://lore.kernel.org/all/Yqywy+Md2AfGDu8v@dev-arch.thelio-3990X/
      
      * tag 'char-misc-5.19-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        bus: fsl-mc-bus: fix KASAN use-after-free in fsl_mc_bus_remove()
        mei: me: add raptor lake point S DID
        mei: hbm: drop capability response on early shutdown
        mei: me: set internal pg flag to off on hardware reset
        misc: rtsx: Fix clang -Wsometimes-uninitialized in rts5261_init_from_hw()
        comedi: vmk80xx: fix expression for tx buffer size
      bc94632c
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · ee4eb6ee
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "MAINTAINERS rectifications and a few minor driver fixes"
      
      * tag 'i2c-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: mediatek: Fix an error handling path in mtk_i2c_probe()
        i2c: designware: Use standard optional ref clock implementation
        MAINTAINERS: core DT include belongs to core
        MAINTAINERS: add include/dt-bindings/i2c to I2C SUBSYSTEM HOST DRIVERS
        i2c: npcm7xx: Add check for platform_driver_register
        MAINTAINERS: Update Synopsys DesignWare I2C to Supported
      ee4eb6ee
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.19-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 063232b6
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "There's not a whole lot this time around (I'm still on vacation) but
        here are some important fixes for new features merged in -rc1:
      
         - Fix a bug where inode flag changes would accidentally drop nrext64
      
         - Fix a race condition when toggling LARP mode"
      
      * tag 'xfs-5.19-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: preserve DIFLAG2_NREXT64 when setting other inode attributes
        xfs: fix variable state usage
        xfs: fix TOCTOU race involving the new logged xattrs control knob
      063232b6
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 354c6e07
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Fix a variety of bugs, many of which were found by folks using fuzzing
        or error injection.
      
        Also fix up how test_dummy_encryption mount option is handled for the
        new mount API.
      
        Finally, fix/cleanup a number of comments and ext4 Documentation
        files"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix a doubled word "need" in a comment
        ext4: add reserved GDT blocks check
        ext4: make variable "count" signed
        ext4: correct the judgment of BUG in ext4_mb_normalize_request
        ext4: fix bug_on ext4_mb_use_inode_pa
        ext4: fix up test_dummy_encryption handling for new mount API
        ext4: use kmemdup() to replace kmalloc + memcpy
        ext4: fix super block checksum incorrect after mount
        ext4: improve write performance with disabled delalloc
        ext4: fix warning when submitting superblock in ext4_commit_super()
        ext4, doc: remove unnecessary escaping
        ext4: fix incorrect comment in ext4_bio_write_page()
        fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment
      354c6e07
    • Linus Torvalds's avatar
      Merge tag '5.19-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · ace2045e
      Linus Torvalds authored
      Pull cifs client fixes from Steve French:
       "Two cifs debugging improvements - one found to deal with debugging a
        multichannel problem and one for a recent fallocate issue
      
        This does include the two larger multichannel reconnect (dynamically
        adjusting interfaces on reconnect) patches, because we recently found
        an additional problem with multichannel to one server type that I want
        to include at the same time"
      
      * tag '5.19-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: when a channel is not found for server, log its connection id
        smb3: add trace point for SMB2_set_eof
      ace2045e
  4. 18 Jun, 2022 6 commits
    • Xiang wangx's avatar
    • Zhang Yi's avatar
      ext4: add reserved GDT blocks check · b55c3cd1
      Zhang Yi authored
      We capture a NULL pointer issue when resizing a corrupt ext4 image which
      is freshly clear resize_inode feature (not run e2fsck). It could be
      simply reproduced by following steps. The problem is because of the
      resize_inode feature was cleared, and it will convert the filesystem to
      meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
      not reduced to zero, so could we mistakenly call reserve_backup_gdb()
      and passing an uninitialized resize_inode to it when adding new group
      descriptors.
      
       mkfs.ext4 /dev/sda 3G
       tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
       mount /dev/sda /mnt
       resize2fs /dev/sda 8G
      
       ========
       BUG: kernel NULL pointer dereference, address: 0000000000000028
       CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748
       ...
       RIP: 0010:ext4_flex_group_add+0xe08/0x2570
       ...
       Call Trace:
        <TASK>
        ext4_resize_fs+0xbec/0x1660
        __ext4_ioctl+0x1749/0x24e0
        ext4_ioctl+0x12/0x20
        __x64_sys_ioctl+0xa6/0x110
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f2dd739617b
       ========
      
      The fix is simple, add a check in ext4_resize_begin() to make sure that
      the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
      disabled.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: default avatarRitesh Harjani <ritesh.list@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      b55c3cd1
    • Ding Xiang's avatar
      ext4: make variable "count" signed · bc75a6eb
      Ding Xiang authored
      Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to
      be a signed integer so we can correctly check for an error code returned
      by dx_make_map().
      
      Fixes: 46c116b9 ("ext4: verify dir block before splitting it")
      Cc: stable@kernel.org
      Signed-off-by: default avatarDing Xiang <dingxiang@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      bc75a6eb
    • Baokun Li's avatar
      ext4: correct the judgment of BUG in ext4_mb_normalize_request · cf4ff938
      Baokun Li authored
      ext4_mb_normalize_request() can move logical start of allocated blocks
      to reduce fragmentation and better utilize preallocation. However logical
      block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
      always be covered by allocated blocks so we should check that by
      modifying and to or in the assertion.
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarRitesh Harjani <ritesh.list@gmail.com>
      Link: https://lore.kernel.org/r/20220528110017.354175-3-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      cf4ff938
    • Baokun Li's avatar
      ext4: fix bug_on ext4_mb_use_inode_pa · a08f789d
      Baokun Li authored
      Hulk Robot reported a BUG_ON:
      ==================================================================
      kernel BUG at fs/ext4/mballoc.c:3211!
      [...]
      RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
      [...]
      Call Trace:
       ext4_mb_new_blocks+0x9df/0x5d30
       ext4_ext_map_blocks+0x1803/0x4d80
       ext4_map_blocks+0x3a4/0x1a10
       ext4_writepages+0x126d/0x2c30
       do_writepages+0x7f/0x1b0
       __filemap_fdatawrite_range+0x285/0x3b0
       file_write_and_wait_range+0xb1/0x140
       ext4_sync_file+0x1aa/0xca0
       vfs_fsync_range+0xfb/0x260
       do_fsync+0x48/0xa0
      [...]
      ==================================================================
      
      Above issue may happen as follows:
      -------------------------------------
      do_fsync
       vfs_fsync_range
        ext4_sync_file
         file_write_and_wait_range
          __filemap_fdatawrite_range
           do_writepages
            ext4_writepages
             mpage_map_and_submit_extent
              mpage_map_one_extent
               ext4_map_blocks
                ext4_mb_new_blocks
                 ext4_mb_normalize_request
                  >>> start + size <= ac->ac_o_ex.fe_logical
                 ext4_mb_regular_allocator
                  ext4_mb_simple_scan_group
                   ext4_mb_use_best_found
                    ext4_mb_new_preallocation
                     ext4_mb_new_inode_pa
                      ext4_mb_use_inode_pa
                       >>> set ac->ac_b_ex.fe_len <= 0
                 ext4_mb_mark_diskspace_used
                  >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
      
      we can easily reproduce this problem with the following commands:
      	`fallocate -l100M disk`
      	`mkfs.ext4 -b 1024 -g 256 disk`
      	`mount disk /mnt`
      	`fsstress -d /mnt -l 0 -n 1000 -p 1`
      
      The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
      Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
      when the size is truncated. So start should be the start position of
      the group where ac_o_ex.fe_logical is located after alignment.
      In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
      is very large, the value calculated by start_off is more accurate.
      
      Cc: stable@kernel.org
      Fixes: cd648b8a ("ext4: trim allocation requests to group size")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarRitesh Harjani <ritesh.list@gmail.com>
      Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a08f789d
    • Eric Biggers's avatar
      ext4: fix up test_dummy_encryption handling for new mount API · 85456054
      Eric Biggers authored
      Since ext4 was converted to the new mount API, the test_dummy_encryption
      mount option isn't being handled entirely correctly, because the needed
      fscrypt_set_test_dummy_encryption() helper function combines
      parsing/checking/applying into one function.  That doesn't work well
      with the new mount API, which split these into separate steps.
      
      This was sort of okay anyway, due to the parsing logic that was copied
      from fscrypt_set_test_dummy_encryption() into ext4_parse_param(),
      combined with an additional check in ext4_check_test_dummy_encryption().
      However, these overlooked the case of changing the value of
      test_dummy_encryption on remount, which isn't allowed but ext4 wasn't
      detecting until ext4_apply_options() when it's too late to fail.
      Another bug is that if test_dummy_encryption was specified multiple
      times with an argument, memory was leaked.
      
      Fix this up properly by using the new helper functions that allow
      splitting up the parse/check/apply steps for test_dummy_encryption.
      
      Fixes: cebe85d5 ("ext4: switch to the new mount api")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20220526040412.173025-1-ebiggers@kernel.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      85456054