Notes on Enabling Intel QAT for ZFS in TrueNAS

I would like to share my notes on how to enable QAT with ZFS in TrueNAS. I built my first TrueNAS server with QAT back in 2022 . I am recently rebuilding the machine with 25.04 and had to relearn how I did it. I decided to write it down so next time it would be easy. This is not a step-by-step guide, but it has all the information.

The note is at On Setting Intel QuickAssist Accelerator for ZFS · GitHub . Here is the text.

The QAT support in ZFS is mostly a research product. It does not get much maintenance as you can see in git history and have at least one bug that I have to fix on my own. In my 2+ years using it, QAT+ZFS has worked out fine. YMMV.

Introduction

The plan was building a TrueNAS storage server, but also using it host various containers and VMs for my services. Essentially, a all-in-one server (or all-in-BOOM if the server fails). Partly because my last NAS had hardware RAID card, and with limited hardware resource in the new server, I was looking at way to offload some ZFS work from the CPU. I did not find any ZFS specific accelerator that I can get cheaply on eBay utill I came across the QZFS paper in USENIX ATC’2019 (QZFS: QAT Accelerated Compression in File System for Application Agnostic and Cost Efficient Data Storage | USENIX). It demonstated a way to use the Intel QuickAssist (QAT) accelerator to handle the checksum and compression. And, the QAT cards are relatively cheap on eBay.

ServeTheHome.com has a very good write up about the different generations of QAT card/accelerators (Intel QuickAssist Parts and Cards by QAT Generation). Basically, for ZFS purpose, there are two types that can be used:

  1. Gen 1 cards: Intel 8920/8950. These are 20 Gbps/50 Gbps cards. About $50 back in 2022 when I first attempted this.
  2. Gen 2 cards: Intel 8960/8970 (C620). These are 50 Gbps/100 Gbps cards. They are essentially a Intel C620 Chipset in it, so I guess you can use the C620 on the motherboard if it has the QAT enabled. Those were expensive back then, but a few around $50 on eBay now.

From Gen 3, the QAT is built in the CPU (Ice Lake, etc.). The driver for Gen 3+ hardware does not have kernel API support, so they do not work with QAT in ZFS. Software side, Intel has two QAT drivers, one for HW2.0 and one for HW1.x. Gen 1/2 cards needs the HW1.x driver. The latest is at https://www.intel.com/content/www/us/en/download/19734/intel-quickassist-technology-driver-for-linux-hw-version-1-x.html.

The QZFS is part of the mainline OpenZFS code. You need to rebuild the ZFS modules to use it. Here are steps:

  1. Build and install the Intel QAT driver with --enable-kapi. This builds the QAT driver with the QAT kernel API which QZFS uses. The driver will install out-of-tree kernel modules and firmware for the QAT cards.
  2. Build/Rebuild the ZFS with environment variable ICP_ROOT=<path to QAT driver driver directory>. This enables the QAT support in ZFS. It also makes the zfs module depends on the intel_qat module.

In fact, you can follow this issue to see how it is done on Debian 11 (Enabling QAT for ZFS with Debian 11 · openzfs/zfs · Discussion #12723 · GitHub).

Problem on TrueNAS SCALE

Because TrueNAS SCALE is an appliance and is not designed to be tinkered with like a regular Linux distro, I cannot just built the QAT driver and ZFS and replace it on a live system (I did not know about the developer mode back then). Also, it does not have DKMS, so I would need to do a live swap of the ZFS kernel modules. I suspect that it would not work because the boot drive is also ZFS. Anything can happen is I swap the ZFS module live, I guess.

So I took the route rebuilding the TrueNAS image with QAT support enabled. There was a suggestion of adding QAT support in TrueNAS officially (Jira), but it was not accepted. The JIRA issue has an example of how to do this (NAS-107334 \ Build ZFS with QAT support by PrivatePuffin · Pull Request #41 · truenas/scale-build · GitHub). It creates a intel-qat debian package and add it to the TrueNAS build. It also changes the configuration of ZFS so that it build with the intel-qat.

TrueNAS Image with QAT Support in ZFS

Long story short, I just created my own verion of the intel-qat package and TrueNAS build script. The current one is based off the latest Intel QAT driver for HW1.x and TrueNAS 25.04.

Just build and install, you will have a TrueNAS with QAT support in ZFS.

  1. The intel-qat driver directory is at /opt/intel-qat. It contains the source code, sample configuration, and build directory.
  2. The sample configuration files are also available at /usr/share/qat/conf.

Setting Up Intel QAT Devices

The setup for Intel QAT Device is quite messy and with a few surprises. I had to go through a bunch of Intel documents to figure out how to create the device configurations. Intel QAT is probably the worst device I have used. Here are the documents if anyone want to read further (all from https://www.intel.com/content/www/us/en/developer/topic-technology/open/quick-assist-technology/resources.html).

Issue 1: SR-IOV

The first issue come up is SR-IOV. Apparently, the QAT device is designed to work on the host machine with SR-IOV disabled, or on guest VMs with Virtual Functions (VFs) passed through. But, it turns out the VFs can be used on the host if they are not passed to any VM. I don’t remember where exactly I read about this (probably from VT or some where on the Internet), but this turned out to be working. To do this, follow steps are needed.

  1. Build QAT driver with --enable-icp-sriov=host.
  2. Make sure SRIOV_ENABLE=1 is in /etc/default/qat.
  3. Remove the kernel module for VF of your device from /etc/modprobe.d/blacklist-qat-vfs.conf. For Intel 8920/8950, remove qat_dh895xccvf. For Intel 8960/8970, remove qat_c62xvf. The VF drivers are not loaded by default, making the VF unusable to the host.
  4. Create configuration files for the QAT Physical Function (PF). Depending on your device, you will need to create /etc/dh895xcc_dev0.conf or /etc/c6xx_dev[0-2].conf (C620 needs three files). Just copy the content from the sample configuration file dh895xccpf_dev0.conf or c6xxpf_dev0.conf for now.
  5. Create configuration files for the QAT Virtual Function (VF). Depending on your device, you will need to create /etc/dh895xccvf_dev0.conf or /etc/c6xxvf_dev0.conf (C620 needs three files). Just copy the content from the sample configuration file dh895xccvf_dev0.conf.vm or c6xxvf_dev0.conf.vm for now.
  6. Start/restart the QAT service to bring up the device. systemctl restart qat.service.

Once these step is done, run adf_ctl status to list all QAT devices. you should see some device are up. In C620’s case, there will be three c6xx device up (PFs) and one c6xxvf device up (VF). Here is what I see on my machine.

Checking status of all devices.
There is 51 QAT acceleration device(s) in the system:
 qat_dev0 - type: c6xx,  inst_id: 0,  node_id: 0,  bsf: 0000:04:00.0,  #accel: 5 #engines: 10 state: up
 qat_dev1 - type: c6xx,  inst_id: 1,  node_id: 0,  bsf: 0000:06:00.0,  #accel: 5 #engines: 10 state: up
 qat_dev2 - type: c6xx,  inst_id: 2,  node_id: 0,  bsf: 0000:08:00.0,  #accel: 5 #engines: 10 state: up
 qat_dev3 - type: c6xxvf,  inst_id: 0,  node_id: 0,  bsf: 0000:04:01.0,  #accel: 1 #engines: 1 state: up
 qat_dev4 - type: c6xxvf,  inst_id: 1,  node_id: 0,  bsf: 0000:04:01.5,  #accel: 1 #engines: 1 state: down
 qat_dev5 - type: c6xxvf,  inst_id: 2,  node_id: 0,  bsf: 0000:04:01.6,  #accel: 1 #engines: 1 state: down
 qat_dev6 - type: c6xxvf,  inst_id: 3,  node_id: 0,  bsf: 0000:04:01.1,  #accel: 1 #engines: 1 state: down
 qat_dev7 - type: c6xxvf,  inst_id: 4,  node_id: 0,  bsf: 0000:04:01.7,  #accel: 1 #engines: 1 state: down
 qat_dev8 - type: c6xxvf,  inst_id: 5,  node_id: 0,  bsf: 0000:04:01.2,  #accel: 1 #engines: 1 state: down
 qat_dev9 - type: c6xxvf,  inst_id: 6,  node_id: 0,  bsf: 0000:04:02.0,  #accel: 1 #engines: 1 state: down
 qat_dev10 - type: c6xxvf,  inst_id: 7,  node_id: 0,  bsf: 0000:04:01.3,  #accel: 1 #engines: 1 state: down
 qat_dev11 - type: c6xxvf,  inst_id: 8,  node_id: 0,  bsf: 0000:04:02.1,  #accel: 1 #engines: 1 state: down
 qat_dev12 - type: c6xxvf,  inst_id: 9,  node_id: 0,  bsf: 0000:04:01.4,  #accel: 1 #engines: 1 state: down
 qat_dev13 - type: c6xxvf,  inst_id: 10,  node_id: 0,  bsf: 0000:04:02.2,  #accel: 1 #engines: 1 state: down
 qat_dev14 - type: c6xxvf,  inst_id: 11,  node_id: 0,  bsf: 0000:04:02.3,  #accel: 1 #engines: 1 state: down
 qat_dev15 - type: c6xxvf,  inst_id: 12,  node_id: 0,  bsf: 0000:04:02.4,  #accel: 1 #engines: 1 state: down
 qat_dev16 - type: c6xxvf,  inst_id: 13,  node_id: 0,  bsf: 0000:04:02.5,  #accel: 1 #engines: 1 state: down
 qat_dev17 - type: c6xxvf,  inst_id: 14,  node_id: 0,  bsf: 0000:04:02.6,  #accel: 1 #engines: 1 state: down
 qat_dev18 - type: c6xxvf,  inst_id: 15,  node_id: 0,  bsf: 0000:04:02.7,  #accel: 1 #engines: 1 state: down
 qat_dev19 - type: c6xxvf,  inst_id: 16,  node_id: 0,  bsf: 0000:06:01.0,  #accel: 1 #engines: 1 state: down
 qat_dev20 - type: c6xxvf,  inst_id: 17,  node_id: 0,  bsf: 0000:06:01.1,  #accel: 1 #engines: 1 state: down
 qat_dev21 - type: c6xxvf,  inst_id: 18,  node_id: 0,  bsf: 0000:06:01.2,  #accel: 1 #engines: 1 state: down
 qat_dev22 - type: c6xxvf,  inst_id: 19,  node_id: 0,  bsf: 0000:06:01.3,  #accel: 1 #engines: 1 state: down
 qat_dev23 - type: c6xxvf,  inst_id: 20,  node_id: 0,  bsf: 0000:06:01.4,  #accel: 1 #engines: 1 state: down
 qat_dev24 - type: c6xxvf,  inst_id: 21,  node_id: 0,  bsf: 0000:06:01.5,  #accel: 1 #engines: 1 state: down
 qat_dev25 - type: c6xxvf,  inst_id: 22,  node_id: 0,  bsf: 0000:06:01.6,  #accel: 1 #engines: 1 state: down
 qat_dev26 - type: c6xxvf,  inst_id: 23,  node_id: 0,  bsf: 0000:06:01.7,  #accel: 1 #engines: 1 state: down
 qat_dev27 - type: c6xxvf,  inst_id: 24,  node_id: 0,  bsf: 0000:06:02.0,  #accel: 1 #engines: 1 state: down
 qat_dev28 - type: c6xxvf,  inst_id: 25,  node_id: 0,  bsf: 0000:06:02.1,  #accel: 1 #engines: 1 state: down
 qat_dev29 - type: c6xxvf,  inst_id: 26,  node_id: 0,  bsf: 0000:06:02.2,  #accel: 1 #engines: 1 state: down
 qat_dev30 - type: c6xxvf,  inst_id: 27,  node_id: 0,  bsf: 0000:06:02.3,  #accel: 1 #engines: 1 state: down
 qat_dev31 - type: c6xxvf,  inst_id: 28,  node_id: 0,  bsf: 0000:06:02.4,  #accel: 1 #engines: 1 state: down
 qat_dev32 - type: c6xxvf,  inst_id: 29,  node_id: 0,  bsf: 0000:06:02.5,  #accel: 1 #engines: 1 state: down
 qat_dev33 - type: c6xxvf,  inst_id: 30,  node_id: 0,  bsf: 0000:06:02.6,  #accel: 1 #engines: 1 state: down
 qat_dev34 - type: c6xxvf,  inst_id: 31,  node_id: 0,  bsf: 0000:06:02.7,  #accel: 1 #engines: 1 state: down
 qat_dev35 - type: c6xxvf,  inst_id: 32,  node_id: 0,  bsf: 0000:08:01.0,  #accel: 1 #engines: 1 state: down
 qat_dev36 - type: c6xxvf,  inst_id: 33,  node_id: 0,  bsf: 0000:08:01.1,  #accel: 1 #engines: 1 state: down
 qat_dev37 - type: c6xxvf,  inst_id: 34,  node_id: 0,  bsf: 0000:08:01.2,  #accel: 1 #engines: 1 state: down
 qat_dev38 - type: c6xxvf,  inst_id: 35,  node_id: 0,  bsf: 0000:08:01.3,  #accel: 1 #engines: 1 state: down
 qat_dev39 - type: c6xxvf,  inst_id: 36,  node_id: 0,  bsf: 0000:08:01.4,  #accel: 1 #engines: 1 state: down
 qat_dev40 - type: c6xxvf,  inst_id: 37,  node_id: 0,  bsf: 0000:08:01.5,  #accel: 1 #engines: 1 state: down
 qat_dev41 - type: c6xxvf,  inst_id: 38,  node_id: 0,  bsf: 0000:08:01.6,  #accel: 1 #engines: 1 state: down
 qat_dev42 - type: c6xxvf,  inst_id: 39,  node_id: 0,  bsf: 0000:08:01.7,  #accel: 1 #engines: 1 state: down
 qat_dev43 - type: c6xxvf,  inst_id: 40,  node_id: 0,  bsf: 0000:08:02.0,  #accel: 1 #engines: 1 state: down
 qat_dev44 - type: c6xxvf,  inst_id: 41,  node_id: 0,  bsf: 0000:08:02.1,  #accel: 1 #engines: 1 state: down
 qat_dev45 - type: c6xxvf,  inst_id: 42,  node_id: 0,  bsf: 0000:08:02.2,  #accel: 1 #engines: 1 state: down
 qat_dev46 - type: c6xxvf,  inst_id: 43,  node_id: 0,  bsf: 0000:08:02.3,  #accel: 1 #engines: 1 state: down
 qat_dev47 - type: c6xxvf,  inst_id: 44,  node_id: 0,  bsf: 0000:08:02.4,  #accel: 1 #engines: 1 state: down
 qat_dev48 - type: c6xxvf,  inst_id: 45,  node_id: 0,  bsf: 0000:08:02.5,  #accel: 1 #engines: 1 state: down
 qat_dev49 - type: c6xxvf,  inst_id: 46,  node_id: 0,  bsf: 0000:08:02.6,  #accel: 1 #engines: 1 state: down
 qat_dev50 - type: c6xxvf,  inst_id: 47,  node_id: 0,  bsf: 0000:08:02.7,  #accel: 1 #engines: 1 state: down

Now, running the example from QAT driver should work on the system.

Issue 2: Configuration Files

The QAT support in ZFS uses the QAT kernel API. It is enabled in the driver, but the configuration file also need some work.

  1. The ENABLE_KAPI=1 need to be set in /etc/default/qat. If the driver is built correctly, this should already been done.
  2. The VF’s configuration file need to have the [KERNEL_QAT] section and only this section should have non-zero NumberCyInstances and NumberDcInstances. The sample configuration dh895xccvf_dev0.conf.vm.km and c6xxvf_dev0.conf.vm.km are good reference.
  3. Because the KERNEL_QAT instances bind interrupt handler to individual cores (set by the CoreAffinity parameter in the file), it is probably a good idea to have multiple VFs which bind to different cores. This can be done by copying the configuration files and update the core affinity accordingly. You can have as many VFs as adf_ctl listed. Each VF needs its own configuration file.

Normally, it would end here. But I decided to customize the configuration a bit further, which led me to a few quirks in the QAT configuration.

According to the documentation, the maximum number of instance each VF can have depends on which services are enabled in the ServiceEnabled parameter. By default, it is set to cy;dc which means all crypto and compression service are enabled. In this case, each VF can only have 1 Cy instance and 1 Dc instance. This is because the symmetric and asymmetric crypto service need separate resource (Sec 4.3.3.2 of PG). Since I only use the QAT for ZFS, the asymmetric crypto service is useless to me. If I only enable QAT with sym and dc service, I should be able to get 2 Cy and 2 Dc instances per VF. I need to do this because I want to have at least 1 Cy and 1 Dc instances for each of the 56 cores. But the number of VFs is only 48. So I 2 Cy and 2 Dc per VF would be perfect.

For C620 to have 2 Cy and 2 Dc instances per VF, the VF configuration need to be like this.

[GENERAL]
ServicesEnabled = dc;sym

ConfigVersion = 2

# Default values for number of concurrent requests
CyNumConcurrentSymRequests = 512
CyNumConcurrentAsymRequests = 64

# Statistics, valid values: 1,0
statsGeneral = 1
statsDh = 0
statsDrbg = 0
statsDsa = 0
statsEcc = 0
statsKeyGen = 0
statsDc = 1
statsLn = 0
statsPrime = 0
statsRsa = 0
statsSym = 1

##############################################
# Kernel Instances Section
##############################################
[KERNEL]
NumberCyInstances = 0
NumberDcInstances = 0

#############################################
# Kernel Instances Section for QAT API
#############################################
[KERNEL_QAT]
NumberCyInstances = 2
NumberDcInstances = 2

# Crypto - Kernel instance #0
Cy0Name = "SSL0"
Cy0IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 0

# Data Compression - Kernel instance #0
Dc0Name = "Dc0"
Dc0IsPolled = 1
# List of core affinities
Dc0CoreAffinity = 0

# Crypto - Kernel instance #1
Cy1Name = "SSL1"
Cy1IsPolled = 1
# List of core affinities
Cy0CoreAffinity = 1

# Data Compression - Kernel instance #1
Dc1Name = "Dc1"
Dc1IsPolled = 1
# List of core affinities
Dc0CoreAffinity = 1

##############################################
# User Process Instance Section
##############################################
[SSL]
NumberCyInstances = 0
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0

And the PF configuration like this.

[GENERAL]
ServicesEnabled = dc;sym

ServicesProfile = DEFAULT

ConfigVersion = 2

#Default values for number of concurrent requests*/
CyNumConcurrentSymRequests = 512
CyNumConcurrentAsymRequests = 64

#Statistics, valid values: 1,0
statsGeneral = 1
statsDh = 1
statsDrbg = 1
statsDsa = 1
statsEcc = 1
statsKeyGen = 1
statsDc = 1
statsLn = 1
statsPrime = 1
statsRsa = 1
statsSym = 1


# Specify size of intermediate buffers for which to
# allocate on-chip buffers. Legal values are 32 and
# 64 (default is 64). Specify 32 to optimize for
# compressing buffers <=32KB in size.
DcIntermediateBufferSizeInKB = 64

##############################################
# Kernel Instances Section
##############################################
[KERNEL]
NumberCyInstances = 0
NumberDcInstances = 0

##############################################
# User Process Instance Section
##############################################
[SSL]
NumberCyInstances = 0
NumberDcInstances = 0
NumProcesses = 1
LimitDevAccess = 0

Issue 3: Polling Mode vs Interrupt Mode

This one is fairly simple. The QAT Kernel API instance can work in either polling mode or interrupt mode. This is controlled by the IsPolled parameter. ZFS need the interrupt mode, so IsPolled=0 is what the configuration needs to have. But this breaks the 2 Cy + 2 Dc per VF configuration. According to PG, the increased instance number only works with instances in polling mode.

Enabling QAT in ZFS

QAT is by default disabled in ZFS. Update the option in /proc.

# cat /sys/module/zfs/parameters/zfs_qat_checksum_disable 
0
# cat /sys/module/zfs/parameters/zfs_qat_compress_disable 
0
# cat /sys/module/zfs/parameters/zfs_qat_encrypt_disable 
0

These values are 1 by default. Then need to be 0 for ZFS to use QAT. You can set one or more of them depending on what you need. I use init/shutdown scripts in TrueNAS to set/unset them. So I don’t have to worry about them being set before qat.service is started.

Then, the following stat should show usage of QAT.

# cat /proc/spl/kstat/zfs/qat

A Few More Notes

So I don’t need to look at documents every time.

  1. Using VFs for kernel API and userspace applications. You can do this, just not with the same VF. A VF can either be used for kernel API or userspace. So you need to balance how many of them are used for each purpose and configure them accordingly. I have no comment on how to set it, but it should be simple following the documentations (I hope, but Intel did a pretty poor job here).
  2. Available resource. I will just summarize it here. A dh895xcc device (8920/8950) has one PF. Each PF has 32 VFs. A c620 device (8960/8970) has three PFs. Each PF has 16 VFs. So total of 48 VFs.
  3. The VFs are just queues in the hardware. All VFs of a dh895xcc device share the same underlying accelerator HW. The c620 is three accelerators HW connected with an internal PCIe switch, so VFs from the same PF will share the HW, but VFs from different PFs are actually different devices.
  4. CoreAffinity parameter is only effective when IsPolled=1. The interrupt handler of one instance can only be bind to one core.
  5. /sys/kernel/debug/qat_* has a bunch of useful stat and info.
  6. The kernel log shows a bunch of “There are 1 requests pending” message and QAT device cannot be stopped. This is due to a bug in the QAT code in ZFS. Basically, it looks like when a interrupt callback is triggered, the request may still be in-flight in the session. In this case, trying to remove the session will results in the aforementioned message and a “retry” return value. But the QAT code in ZFS never check the return value and retry, therefore leaving some session in device and never remove them. The Intel guide suggested to check if there is a in-flight request before removing the session. I have a patch here qzfs-fix · GitHub . Since this issue has never been fixed in the upstream ZFS code, I suspect that I might be the only user that uses QAT with ZFS regularly.
3 Likes

Awesome work!

Have you also looked at AMD EPYC?

We should have a private chat.

I would like to try AMD EPYC, but I don’t have a system. I expect the QAT cards to work with AMD EPYC without problem. They don’t seem to rely on anything specific to Intel CPU.

Nice! I am curious if you do any real benchmarking at the upper performance end with it? We’ve played with QAT a while back, but didn’t see any real wins with regard to getting more perf from the system overall. But potentially good if you want to save CPU cycles for something else.

I did run iozone with the old build, but haven’t try any benchmark with the new. I might still have those results (need to dig a little bit). The saving on CPU cycles was pretty good. The usage meter on the UI barely moved.

I plan to do some testing later with ksmbd. I’d be happy to run some fio or other tests.