天天看点

Virtual Function I/O (VFIO) Mediated devices

The number of use cases for virtualizing DMA devices that do not have built-in SR_IOV capability is increasing. Previously, to virtualize such devices, developers had to create their own management interfaces and APIs, and then integrate them with user space software. To simplify integration with user space software, we have identified common requirements and a unified management interface for such devices. 虚拟化没有内置SR_IOV能力的DMA设备的用例数量正在增加。以前,为了虚拟化这类设备,开发者必须创建自己的管理界面和API,然后与用户空间软件集成。为了简化与用户空间软件的集成,我们已经确定了此类设备的共同要求和统一的管理界面。

The VFIO driver framework provides unified APIs for direct device access. It is an IOMMU/device-agnostic framework for exposing direct device access to user space in a secure, IOMMU-protected environment. This framework is used for multiple devices, such as GPUs, network adapters, and compute accelerators. With direct device access, virtual machines or user space applications have direct access to the physical device. This framework is reused for mediated devices. VFIO驱动框架为直接设备访问提供统一的API。它是一个IOMMU/设备无关的框架,用于在一个安全的、受IOMMU保护的环境中向用户空间暴露直接设备访问。这个框架用于多种设备,如GPU、网络适配器和计算加速器。通过直接设备访问,虚拟机或用户空间应用程序可以直接访问物理设备。这个框架被重用在中间设备上。

The mediated core driver provides a common interface for mediated device management that can be used by drivers of different devices. This module provides a generic interface to perform these operations: 中间设备核心驱动为设备管理提供了一个通用接口,可以被不同设备的驱动使用。这个模块提供了一个通用接口来执行这些操作。

  • Create and destroy a mediated device创建和销毁一个中间设备
  • Add a mediated device to and remove it from a mediated bus driver将一个中介设备添加到和从一个中介总线驱动器中删除
  • Add a mediated device to and remove it from an IOMMU group在IOMMU组中添加和删除一个中间设备

The mediated core driver also provides an interface to register a bus driver. For example, the mediated VFIO mdev driver is designed for mediated devices and supports VFIO APIs. The mediated bus driver adds a mediated device to and removes it from a VFIO group. 中间的核心驱动也提供了一个接口来注册一个总线驱动。例如,中间的VFIO mdev驱动是为中间设备设计的,支持VFIO APIs。中间总线驱动将一个中间设备添加到一个VFIO组中,并从该组中移除。

The following high-level block diagram shows the main components and interfaces in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM devices as examples, as these devices are the first devices to use this module: 下面的概要框图显示了VFIO中间驱动框架的主要组件和接口。图中以英伟达、英特尔和IBM设备为例,因为这些设备是第一批使用该模块的设备。

Virtual Function I/O (VFIO) Mediated devices

Registration Interfaces

The mediated core driver provides the following types of registration interfaces: 中间的核心驱动提供以下类型的注册接口。

  • Registration interface for a mediated bus driver中间总线驱动的注册接口
  • Physical device driver interface物理设备驱动接口

Registration Interface for a Mediated Bus Driver

The registration interface for a mediated device driver provides the following structure to represent a mediated device’s driver: 中间设备驱动程序的注册接口提供了以下结构来表示一个中间设备的驱动程序。

struct mdev_driver {

        int  (*probe)  (struct mdev_device *dev);

        void (*remove) (struct mdev_device *dev);

        unsigned int (*get_available)(struct mdev_type *mtype);

        ssize_t (*show_description)(struct mdev_type *mtype, char *buf);

        struct device_driver    driver;

};

A mediated bus driver for mdev should use this structure in the function calls to register and unregister itself with the core driver: mdev的中间总线驱动应该在函数调用中使用这个结构,以便在核心驱动中注册和取消注册。

  • Register:
  • int mdev_register_driver(struct mdev_driver *drv);
  • Unregister:
  • void mdev_unregister_driver(struct mdev_driver *drv);

The mediated bus driver’s probe function should create a vfio_device on top of the mdev_device and connect it to an appropriate implementation of vfio_device_ops. 中间总线驱动的probe函数应该在mdev_device之上创建一个vfio_device,并将其连接到vfio_device_ops的适当实现。

When a driver wants to add the GUID creation sysfs to an existing device it has probe’d to then it should call: 当一个驱动程序想把GUID创建sysfs添加到它已经探测到的现有设备上时,它应该调用。

int mdev_register_parent(struct mdev_parent *parent, struct device *dev,

                    struct mdev_driver *mdev_driver);

This will provide the ‘mdev_supported_types/XX/create’ files which can then be used to trigger the creation of a mdev_device. The created mdev_device will be attached to the specified driver. 这将提供'mdev_supported_types/XX/create'文件,然后可以用它来触发mdev_device的创建。创建的mdev_device将被连接到指定的驱动程序上。

When the driver needs to remove itself it calls: 当驱动程序需要移除自己时,它会调用

void mdev_unregister_parent(struct mdev_parent *parent);

Which will unbind and destroy all the created mdevs and remove the sysfs files. 这将解除绑定并销毁所有创建的mdevs,并删除sysfs文件。

Mediated Device Management Interface Through sysfs

The management interface through sysfs enables user space software, such as libvirt, to query and configure mediated devices in a hardware-agnostic fashion. This management interface provides flexibility to the underlying physical device’s driver to support features such as: 通过sysfs的管理接口,用户空间的软件,如libvirt,可以以一种硬件无关的方式查询和配置中间设备。这个管理接口为底层物理设备的驱动程序提供了灵活性,以支持以下功能。

  • Mediated device hot plug中间设备热插拔
  • Multiple mediated devices in a single virtual machine一个虚拟机中的多个中间设备
  • Multiple mediated devices from different physical devices来自不同物理设备的多个中间设备

Links in the mdev_bus Class Directory

The /sys/class/mdev_bus/ directory contains links to devices that are registered with the mdev core driver. /sys/class/mdev_bus/目录包含了与mdev核心驱动注册的设备的链接。

Directories and files under the sysfs for Each Physical Device

|- [parent physical device]

|--- Vendor-specific-attributes [optional]

|--- [mdev_supported_types]

|     |--- [<type-id>]

|     |   |--- create

|     |   |--- name

|     |   |--- available_instances

|     |   |--- device_api

|     |   |--- description

|     |   |--- [devices]

|     |--- [<type-id>]

|     |   |--- create

|     |   |--- name

|     |   |--- available_instances

|     |   |--- device_api

|     |   |--- description

|     |   |--- [devices]

|     |--- [<type-id>]

|          |--- create

|          |--- name

|          |--- available_instances

|          |--- device_api

|          |--- description

|          |--- [devices]

  • [mdev_supported_types]

The list of currently supported mediated device types and their details. 当前支持的中间设备类型的列表和它们的细节。

[<type-id>], device_api, and available_instances are mandatory attributes that should be provided by vendor driver. device_api和available_instances是必须的属性,应该由供应商驱动提供。

  • [<type-id>]

The [<type-id>] name is created by adding the device driver string as a prefix to the string provided by the vendor driver. This format of this name is as follows: [<类型-ID>]的名称是通过将设备驱动程序的字符串作为前缀添加到供应商驱动程序提供的字符串中而创建的。这个名称的格式如下。

sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);

  • device_api

This attribute shows which device API is being created, for example, “vfio-pci” for a PCI device. 这个属性显示了正在创建的设备API,例如,"vfio-pci "代表一个PCI设备。

  • available_instances

This attribute shows the number of devices of type <type-id> that can be created. 这个属性显示了可以创建的<type-id>类型的设备的数量。

  • [device]

This directory contains links to the devices of type <type-id> that have been created. 这个目录包含已经创建的<type-id>的设备的链接。

  • name

This attribute shows a human readable name. 这个属性显示的是一个人可读的名字。

  • description

This attribute can show brief features/description of the type. This is an optional attribute. 这个属性可以显示该类型的简要特征/描述。这是一个可选的属性。

Directories and Files Under the sysfs for Each mdev Device

|- [parent phy device]

|--- [$MDEV_UUID]

       |--- remove

       |--- mdev_type {link to its type}

       |--- vendor-specific-attributes [optional]

  • remove (write only)

Writing ‘1’ to the ‘remove’ file destroys the mdev device. The vendor driver can fail the remove() callback if that device is active and the vendor driver doesn’t support hot unplug. 向'remove'文件写'1'会破坏mdev设备。如果该设备处于活动状态,并且厂商驱动不支持热拔插,那么厂商驱动可以使remove()回调失败。

Example:

# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove

Mediated device Hot plug

Mediated devices can be created and assigned at runtime. The procedure to hot plug a mediated device is the same as the procedure to hot plug a PCI device. 中间设备可以在运行时创建和分配。热插拔中间设备的程序与热插拔PCI设备的程序相同。

Translation APIs for Mediated Devices

The following APIs are provided for translating user pfn to host pfn in a VFIO driver: 以下是提供的API,用于在VFIO驱动中把用户pfn转换为主机pfn。

int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,

                          int npage, int prot, struct page **pages);

void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,

                            int npage);

These functions call back into the back-end IOMMU module by using the pin_pages and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently these callbacks are supported in the TYPE1 IOMMU module. To enable them for other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide these two callback functions. 这些函数通过使用结构vfio_iommu_driver_ops[4]的pin_pages和unpin_pages回调来回调到后端IOMMU模块。目前这些回调在TYPE1 IOMMU模块中被支持。为了使其他IOMMU后端模块,如PPC64 sPAPR模块能够使用它们,它们需要提供这两个回调函数。

Using the Sample Code

mtty.c in samples/vfio-mdev/ directory is a sample driver program to demonstrate how to use the mediated device framework. samples/vfio-dev/目录下的mtty.c是一个示例驱动程序,用来演示如何使用中间设备框架。

The sample driver creates an mdev device that simulates a serial port over a PCI card. 该样本驱动程序创建了一个mdev设备,通过PCI卡模拟一个串行端口。

  1. Build and load the mtty.ko module. 建立并加载mtty.ko模块

This step creates a dummy device, 这一步创建了一个dummy设备 /sys/devices/virtual/mtty/mtty/

Files in this device directory in sysfs are similar to the following: sysfs中该设备目录下的文件与以下内容类似。

# tree /sys/devices/virtual/mtty/mtty/

   /sys/devices/virtual/mtty/mtty/

   |-- mdev_supported_types

   |   |-- mtty-1

   |   |   |-- available_instances

   |   |   |-- create

   |   |   |-- device_api

   |   |   |-- devices

   |   |   `-- name

   |   `-- mtty-2

   |       |-- available_instances

   |       |-- create

   |       |-- device_api

   |       |-- devices

   |       `-- name

   |-- mtty_dev

   |   `-- sample_mtty_dev

   |-- power

   |   |-- autosuspend_delay_ms

   |   |-- control

   |   |-- runtime_active_time

   |   |-- runtime_status

   |   `-- runtime_suspended_time

   |-- subsystem -> ../../../../class/mtty

   `-- uevent

  1. Create a mediated device by using the dummy device that you created in the previous step: 通过使用你在上一步创建的虚拟设备,创建一个中间设备
  2. # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >    \
  3.          /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
  4. Add parameters to qemu-kvm: 向qemu-kvm添加参数
  5. -device vfio-pci,\
  6.  sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
  7. Boot the VM. 启动虚拟机

In the Linux guest VM, with no hardware on the host, the device appears as follows: 在Linux客户虚拟机中,由于主机上没有硬件,设备显示如下。

# lspci -s 00:05.0 -xxvv

00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])

        Subsystem: Device 4348:3253

        Physical Slot: 5

        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-

Stepping- SERR- FastB2B- DisINTx-

        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-

<TAbort- <MAbort- >SERR- <PERR- INTx-

        Interrupt: pin A routed to IRQ 10

        Region 0: I/O ports at c150 [size=8]

        Region 1: I/O ports at c158 [size=8]

        Kernel driver in use: serial

00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00

10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00

20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32

30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00

In the Linux guest VM, dmesg output for the device is as follows:

serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10

0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A

0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A

  1. In the Linux guest VM, check the serial ports: 在Linux客户虚拟机中,检查串行端口
  2. # setserial -g /dev/ttyS*
  3. /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
  4. /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
  5. /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
  6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or /dev/ttyS2 with hardware flow control disabled. 使用minicom或任何终端仿真程序,打开/dev/ttyS1或/dev/ttyS2端口,禁用硬件流控制。
  7. Type data on the minicom terminal or send data to the terminal emulation program and read the data. 在minicom终端上输入数据或向终端仿真程序发送数据,并读取数据

Data is loop backed from hosts mtty driver. 数据会在host mtty驱动回显

  1. Destroy the mediated device that you created: 销毁你创建的中间设备

# echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove

继续阅读