datastructure

file: /proc/sys/kernel/datastructure
variable: kernel.datastructure
Official reference

User visible objects are backed by following datastructures:

  • iommufd_ioas for IOMMUFD_OBJ_IOAS.
  • iommufd_device for IOMMUFD_OBJ_DEVICE.
  • iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING.
  • iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED.
  • iommufd_fault for IOMMUFD_OBJ_FAULT.
  • iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
  • iommufd_vdevice for IOMMUFD_OBJ_VDEVICE.
  • iommufd_veventq for IOMMUFD_OBJ_VEVENTQ.
  • iommufd_hw_queue for IOMMUFD_OBJ_HW_QUEUE.

Several terminologies when looking at these datastructures:

  • Automatic domain - refers to an iommu domain created automatically when attaching a device to an IOAS object. This is compatible to the semantics of VFIO type1.

  • Manual domain - refers to an iommu domain designated by the user as the target pagetable to be attached to by a device. Though currently there are no uAPIs to directly create such domain, the datastructure and algorithms are ready for handling that use case.

  • In-kernel user - refers to something like a VFIO mdev that is using the IOMMUFD access interface to access the IOAS. This starts by creating an iommufd_access object that is similar to the domain binding a physical device would do. The access object will then allow converting IOVA ranges into struct page * lists, or doing direct read/write to an IOVA.

iommufd_ioas serves as the metadata datastructure to manage how IOVA ranges are mapped to memory pages, composed of:

  • struct io_pagetable holding the IOVA map
  • struct iopt_area’s representing populated portions of IOVA
  • struct iopt_pages representing the storage of PFNs
  • struct iommu_domain representing the IO page table in the IOMMU
  • struct iopt_pages_access representing in-kernel users of PFNs
  • struct xarray pinned_pfns holding a list of pages pinned by in-kernel users

Each iopt_pages represents a logical linear array of full PFNs. The PFNs are ultimately derived from userspace VAs via an mm_struct. Once they have been pinned the PFNs are stored in IOPTEs of an iommu_domain or inside the pinned_pfns xarray if they have been pinned through an iommufd_access.

PFN have to be copied between all combinations of storage locations, depending on what domains are present and what kinds of in-kernel “software access” users exist. The mechanism ensures that a page is pinned only once.

An io_pagetable is composed of iopt_areas pointing at iopt_pages, along with a list of iommu_domains that mirror the IOVA to PFN map.

Multiple io_pagetable-s, through their iopt_area-s, can share a single iopt_pages which avoids multi-pinning and double accounting of page consumption.

iommufd_ioas is shareable between subsystems, e.g. VFIO and VDPA, as long as devices managed by different subsystems are bound to a same iommufd.