天天看点

SCSI-MQ support in RHEL7https://access.redhat.com/articles/2857501

https://access.redhat.com/articles/2857501

Updated 2017年一月12日23:52 - 

English 

Overview

Block-MQ is a queueing mechanism in newer Linux kernels, including RHEL7.2 and beyond, that provides greater concurrency and, potentially, higher performance.

It is supported in RHEL7.2 and beyond by the following block drivers:

  • mtip32xx: Block device driver for Micron PCIe SSDs
  • rbd: Rados block device driver for Ceph distributed objects
  • virtio_blk: Virtual block device driver (QEMU)
  • nvme: NVM Express block device driver (for SSDs connected directly to PCI or PCIe)

In addition, RHEL7.2 and beyond contains support for SCSI-MQ, which allows the SCSI subsystem to use BLOCK-MQ queueing. However, this functionality

is TECH PREVIEW in RHEL7 for general use. Individually updated drivers with their own support have been tested and are not Tech Preview, as noted below.

It is possible to enable SCSI-MQ globally for all SCSI drivers using the “scsi_mod.use_blk_mq=1” kernel option, but doing so results in the Tech Preview warning.

Enabling SCSI-MQ in a driver has 2 purposes – it uses a different queueing API in the block layer which has the potential to be more efficient, and it allows

the use of multiple hardware queues per CPU if the HBA and the driver support it. There are no SCSI drivers in RHEL7.2 that support multiple hardware

queues. The only driver in RHEL7.3 that supports multiple hardware queues is the Infiniband SRP driver.

In RHEL7.3 the Infiniband SRP driver was updated to a much more recent version, and uses SCSI-MQ by default. This is for the SRP driver ONLY. This was

done because the upstream driver version on which the update was based is no longer used in non-SCSI-MQ mode.

Block-MQ can also be also used by Device-Mapper Multipath using the “dm_mod.use_blk_mq=Y” kernel option. In general this should be enabled if the underlying SCSI-MQ or Infiniband/SRP driver has the use of Block-MQ enabled, otherwise the potential concurrency obtained by the use of multiple queues will be throttled by the DM Multipath queues.

Updating a SCSI driver to use multiqueue in RHEL7

In absence of any additional coding, enabling SCSI-MQ in the midlayer using scsi_mod.use_blk_mq=1 will cause the midlayer to use BLK-MQ queueing, but does not require any driver changes. If, however, the driver wishes to support multiple hardware queues, driver changes are necessary.

The driver should use the shost_use_blk_mq(shost) API to determine if SCSI-MQ is enabled. Do not test the scsi_use_blk_mq module parameter directly.

RHEL7.3 does not contain the upstream patch: “scsi: use host wide tags by default” due to potential compatibility problems with drivers that were not tested and/or updated to more recent upstream versions. If a driver needs host-wide-tags to work properly with SCSI-MQ (and it probably does) then the it can enable this behavior by setting the RHEL-specific field “.use_host_wide_tags = 1” in the SCSI host template for the driver.

struct Scsi_Host.nr_hw_queues should be set to the number of hardware queues the driver needs to allocate. This would be set in the Scsi_Host structure by the driver before calling scsi_add_host(). If this field is zero (e.g. uninitialized), a value of 1 is used.

When a scsi_cmnd is received by the driver's _queuecommand() routine, the driver can obtain the BLK-MQ unique tag value and extract both the hardware queue index, and per-queue tag value:

Raw

tag = blk_mq_unique_tag(scmd->request);
        hwq = blk_mq_unique_tag_to_hwq(tag);
        idx = blk_mq_unique_tag_to_tag(tag);
           

The “idx” value is guaranteed to remain unique within a hardware queue for the set of currently active commands, so it can be used, if desired, as an index into a table or other data structure.

Upon command completion, note that scsi_host_find_tag(tag) can be used to lookup a scsi_cmnd given a host-wide unique tag number, although there are other possible implementations, such as storing a pointer to the scsi_cmnd in a driver's private structure, as most drivers do.

The driver developer will have to determine how to use the number of available hardware queues on the HBA and how this will cause the available tag space to be divided among all the queues.

Because SCSI-MQ is in Tech Preview for RHEL7, all RHEL7 SCSI drivers that enable the use of SCSI-MQ should add a module option which allows the

functionality to be enabled/disabled. The driver can, however, have SCSI-MQ enabled by default if desired.

The changes should look like what was added for the Infiniband/SRP driver:

Raw

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 002fe18..d9afe53 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -130,6 +130,11 @@ module_param(ch_count, uint, 0444);
 MODULE_PARM_DESC(ch_count,
                 "Number of RDMA channels to use for communication with an SRP target. Using more than one channel improves performance if the HCA supports multiple completion vectors. The default value is the minimum of four times the number of online CPU sockets and the number of completion vectors supported by the HCA.");

+/* set default to true */
+static bool srp_use_blk_mq = true;
+module_param_named(use_blk_mq, srp_use_blk_mq, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(use_blk_mq, "Use blk-mq for SRP");
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device, void *client_data);
 static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
@@ -3233,6 +3239,8 @@ static ssize_t srp_create_target(struct device *dev,
        if (!target_host)
                return -ENOMEM;

+       target_host->use_blk_mq = srp_use_blk_mq;
+
        target_host->transportt  = ib_srp_transport_template;
        target_host->max_channel = 0;
        target_host->max_id      = 1;
           

These changes set the ->use_blk_mq flag in the allocated Scsi_Host structure prior to calling scsi_add_host(). Setting the flag individually will not trigger the tech preview warning.

继续阅读