Behavior of zraid has changed, unsure why, fast for a few MB and then slows to KBps

Long post, I apologize for the detail, but I didn’t want to find myself not providing the required information for the question. Summary: I am not an expert on setting up RAIDZ2, and am suspecting something in my setup is causing something to get bogged down. I’m looking for recommendations and why it would help. My initial suspicion is that I need to add a metadata, log, or cache vdev, though that’s little more than a hunch right now.

  • Environment: TrueNAS Scale 24.10.2.4
  • Dell Poweredge R730xd, disks presented directly, not RAID configured
  • Data Disks: Micron_5200_MTFDDAK1T9TDD
  • OS Disk: OCZ-Vertex3
  • NIC: X520-DA2 10Gb
  • Memory: 62.8 GiB, ECC. Currently showing 6.7 GiB to services, 22.9 to ZFS cache.
  • CPU: Xeon E5-2623 v3 @ 3GHz. 2 CPUs, total of 8 cores.
  • Raid configuration: 2xRAIDZ2 | 6 wide | 1.75 TiB,
    • No Metadata, Log, Cache, or dedup vdevs.
    • 2 x spare drives,
  • Currently 61% full. Usable capacity 13.8 TiB, used 8.41TiB.
  • ZFS health shows good, no errors, and no smart errors.
  • 3 datasets, plus the iocage/ix-applications
    • 1 NFS share, 1 SMB share, and one that is both SMB and NFS

Testing when I changed to 10Gb in my house about 6 months ago showed consistent large file transfers (multi-gig movie files) that would top out around 1 GBps and settle around 600 GBps (nework traffic of 9.2 Gbps, settling around 4.8 Gbps). I established this was to be expected with the zfs cache in memory filling up and then writing at disk speed, and was happy with it.

I retired several of my other pools over the last several months, deleted them, and then removed the disks, and at some point afterwards I noted that my file transfers to my one remaining pool would start out at full speed for a few seconds, and then slow down to something on the order of KBps to 2MBps.

I have tested the network layer, using iperf, and confirmed that at a network layer, my speed remains at the expected speed for 10Gb.

I have started trying to prepare myself for the upgrade to 25.04 to see if the problem is tied to the old version of Truenas, but when I decided to create a tar file of my docker apps as a backup, I noted that my performance was extremely slow; it was showing the same behavior internally as I was seeing on the SMB share. this leads me to believe the issue is with either the hardware or the ZFS configuration, and also eliminates the share protocol from the list of suspects. I then tested on the NFS share (again, same pool, different dataset), and confirmed it has the same behavior: fast for a couple meg, then slows down to KB to MBps.

Due to being a new member, I am unable to upload the output of performing an iostat -x 1 for an extended amount of time, can someone suggest a method of doing so? In the mean time I’ll put a comment with a subset of what i’m seeing in the results.

Current RAIDZ2 devs:

  • RAIDZ2 device 1:
    • sda, sdc, sde, sdf, sdg, sdh
  • RAIDZ2 device 2:
    • sdj, sdk, sdl, sdm, sdn, sdo
  • Spares:
    • sdb, sdi

What I noticed is that for extended periods of time, one of the raid devices (what I’m calling device 1, the a-h device) has at least one disk that becomes saturated at 100% utilization, and then periodically I get a burst of activity from the other device. I am at a loss as to what I need to do to fix this.

I am wondering if possibly when I deleted the unused pools, somehow I deleted something that was being used as a log, cache, or metadata vdev for this pool…I don’t know how I would have done that, but I can’t say it’s impossible. Either way, what should I next look at for troubleshooting this? I don’t want to try to upgrade to 25.04 if the i/o for this pool is abysmal or if the whole pool setup is already broken. I do have intel SSD D3-S4510 Series 1.92 TB drives that I could replace the microns with, if it would help (I retired my company’s datacenter a while back and they let me keep the hardware), as well as spare microns of the same model if simply adding a log, cache, or metadata vdev would help.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.06    0.25   19.12    0.00   80.50

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    4.00    120.00     0.00   0.00  867.75    30.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.47 100.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00   72.00    712.00     0.00   0.00    0.12     9.89    0.00      0.00     0.00   0.00    0.00     0.00    2.00    0.00    0.01   0.00
sde              1.00     16.00     0.00   0.00  904.00    16.00    2.00    172.00     0.00   0.00  900.00    86.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    2.71  85.60
sdf              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sdg              1.00     12.00     0.00   0.00 1333.00    12.00    2.00    200.00     0.00   0.00 2836.00   100.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    7.00 100.00
sdh              1.00     12.00     0.00   0.00  666.00    12.00    1.00     40.00     0.00   0.00  667.00    40.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.33  90.40
sdi              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdj              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdk              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdl              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdm              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdn              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdo              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.94    0.06    0.25   13.39    0.00   85.36

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              1.00     32.00     0.00   0.00    1.00    32.00    2.00     32.00     0.00   0.00  167.00    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.33  20.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    2.00    256.00     0.00   0.00 2133.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    4.27  63.60
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              1.00     32.00     0.00   0.00    0.00    32.00    3.00    232.00     0.00   0.00  690.33    77.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    2.07  62.80
sdf              1.00     12.00     0.00   0.00 2080.00    12.00    1.00    128.00     0.00   0.00 2133.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    4.21  87.60
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00  58.40
sdh              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sdi              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdj              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdk              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdl              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdm              1.00      4.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdn              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdo              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.06    0.06   12.51    0.00   87.24

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              1.00     32.00     0.00   0.00  956.00    32.00    2.00    108.00     0.00   0.00 1500.50    54.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.96  62.40
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdf              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
sdh              0.00      0.00     0.00   0.00    0.00     0.00    2.00    256.00     0.00   0.00 2058.50   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    4.12 100.00
sdi              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdj              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdk              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdl              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdm              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdn              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdo              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.94    0.06    0.31   12.52    0.00   86.16

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              5.00    148.00     0.00   0.00    0.40    29.60    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.40
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              7.00    184.00     0.00   0.00    0.29    26.29    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.80
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              6.00    180.00     0.00   0.00    0.50    30.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.80
sdf              1.00     32.00     0.00   0.00    0.00    32.00    3.00    112.00     0.00   0.00 1059.67    37.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.18  36.80
sdg              6.00    180.00     0.00   0.00  379.33    30.00    4.00    360.00     0.00   0.00 1470.50    90.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    8.16  93.60
sdh              2.00     36.00     0.00   0.00   29.00    18.00    2.00    108.00     0.00   0.00 1600.00    54.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    3.26  36.40
sdi              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdj              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdk              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdl              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdm              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdn              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdo              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.06    0.13   12.96    0.00   86.79

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              2.00     40.00     0.00   0.00   50.00    20.00   15.00     72.00     0.00   0.00   71.13     4.80    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.17  76.40
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              2.00     44.00     0.00   0.00  116.50    22.00   12.00     60.00     0.00   0.00   75.00     5.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.13  72.80
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              2.00     44.00     0.00   0.00    0.00    22.00   13.00     72.00     0.00   0.00   61.62     5.54    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.80  56.40
sdf              4.00     52.00     0.00   0.00   41.50    13.00   11.00     52.00     0.00   0.00   66.73     4.73    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.90  63.20
sdg              2.00     12.00     0.00   0.00  120.50     6.00   14.00     80.00     0.00   0.00   69.07     5.71    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.21  79.60
sdh              2.00     16.00     0.00   0.00   49.50     8.00   13.00     60.00     0.00   0.00   53.92     4.62    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.80  59.60
sdi              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdj              0.00      0.00     0.00   0.00    0.00     0.00   12.00     56.00     0.00   0.00   47.33     4.67    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.57  53.20
sdk              2.00     44.00     0.00   0.00   67.00    22.00    9.00     40.00     0.00   0.00   48.22     4.44    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.57  49.60
sdl              2.00     44.00     0.00   0.00   67.00    22.00    9.00     48.00     1.00  10.00   44.56     5.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.53  50.00
sdm              0.00      0.00     0.00   0.00    0.00     0.00    9.00     40.00     0.00   0.00   48.22     4.44    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.43  53.20
sdn              3.00     44.00     0.00   0.00   67.00    14.67   10.00     48.00     0.00   0.00   66.70     4.80    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.87  56.40
sdo              2.00     44.00     0.00   0.00  100.50    22.00    8.00     36.00     0.00   0.00   54.12     4.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.63  56.40

Additional Info: I am running the following apps:
qbittorrent, resilio sync, plex, and nginx-proxy-manager.
nginx-proxy-manager hasn’t started reliably for a while (have to reconfigure I think after I update to the 25.04 line) and plex has issues with transcoding PGS subtitles into movies, but for the most part plex seems operational. No VMs, jails, or other loads on the machine.

You can try working through this article to see if it help pinpoint anything. If you deleted a metadata sVDEV, you would have killed the pool. L2ARC and SLOG devices could be removed from pools without harming the pools. SLOG is not a write cache. You only need it for sync writes. So databases, Block storage (iSCSI, zvols for VMs), and NFS. Do you have block storage? Recommendation is keeping pool below 50% for block storage.

Have you run SMART Long tests on your drives and checked the results? Other item to check is HBA / Raid card and make sure it didn’t get it’s mode flipped. Check cables or backplanes and see if you can figure out any sort of pattern, like a bad breakout cable or port.

I have done SMART Long tests when I first noticed the issue, no errors detected. I also did a fresh scrub.

One of the old pools that I retired was an iscsi block device for my old vmware node. at this point I no longer run block devices.

Now that I think about it, when I first noticed something, I did try simplifying my setup; previously I was using an enclosure attached via an HBA (have to look up the model, LSI of some sort, I think). I moved the disks to the built in slots, made sure the built-in raid card was set to direct passthrough, and started it up; the issues went away. I forgot about that though, because at the same time I discovered my 10Gb switch in my office was overheating, and so I just direct-connected my desktop to the 10Gb run down to my server area, and attributed the improvement to that. I made the mistake of changing too many variables at the same time. Enclosure, HBA, and network topology.

Anyways, I’ll double check the HBA settings, make sure it’s still leaving the disks alone, and I’ll look through that link you sent.

(Edit: Clarification, when I say they went away, they went away for about 1-2 months, and resurfaced a week ago)

If your Dell ‘HBA’ card has a cache, try to turn it off. You didn’t mention the model. Some are not good with ZFS unless they are a true HBA and flashed to ‘IT mode’

Hoping details are covered or explained by this

iostat.txt (5.3 MB)

It now allowed me to upload my iostat output file. not sure it’s all THAT useful, but uploading it nonetheless.

The initial HBA was definitely a true HBA, not a raid card. I’ll check which model it is when I get a moment tomorrow (later today). the current controller is a PERC H730 Mini, and it says The current controller mode is HBA.

Status	 
Name	PERC H730 Mini (Embedded)
Device Description	Integrated RAID Controller 1
Controller Mode	HBA
Security Status	Not Assigned
Encryption Mode	None
Firmware Version	25.5.3.0005
Driver Version	--NA--
Cache Memory Size	1024 MB
SAS Address	0x51866DA09E63BB00
PCI Vendor ID	0x1000
PCI Subvendor ID	0x1028
PCI Device ID	0x5d
PCI Subdevice ID	0x1f49
PCI Bus	0x2
PCI Device	0x0
PCI Function	0x0
Slot Type	Information Not Available
Slot Length	Information Not Available
Bus Width	Information Not Available
Copyback Mode	On
Patrol Read Rate	30%
Patrol Read State	Stopped
Patrol Read Mode	Auto
Check Consistency Rate	30%
Check Consistency Mode	Normal
Rebuild Rate	30%
BGI Rate	30%
Reconstruct Rate	30%
Max Capable Speed	12.0 Gbps
Persistent Hotspare	Disabled
Load Balance Setting	Auto
Preserved Cache	Not Present
Time Interval for Spin Down	30 minutes
Spindown Unconfigured Drives	Disabled
Spindown Hotspares	Disabled
Learn Mode	Not Supported
T10 PI Capability	Not Capable
Support RAID10 Uneven Spans	Supported
Support Enhanced Auto Foreign Import	Supported
Enhanced Auto Import Foreign Config	Disabled
Support Controller Boot Mode	Supported
Controller Boot Mode	Continue Boot On Error
Real-time Configuration Capability	Capable

If you have a single disk in the array that shows much higher in use than the others - that (to me) implies that the disk is not performing correctly. As you have Z2, I would suggest replacing the disk with one of the spares and seeing if that makes a difference.

I did an internet search for that model and the second article that came up. I suggest doing further research on that card and, maybe get a true HBA.

Fair enough, I’ll fail back to the true HBA that I have with enclosure; I was hoping to save power, if I could actually use the built in slots, but so be it, I suppose.

Unfortunately, I’m not showing a single disk that is consistently higher; it’s all the disks in a single raidz device, not a single disk. almost as though a single raidz device is being hit primarily and the secondary catches up later on.

Did you install both VDEVs at the same time or was one added later? If VDEV1 was 80% full and then you added VDEV2, you would expect most of the writes to go to the second VDEV as the system tries to balance them even though the pool will say about 40% capacity.

they were installed at the same time; I created it with the specs currently shown.
My tar of the docker apps finally completed (took about 8 hours to back up 6 GB of data), and this morning I performed the upgrade to 25.04; as of this morning, the issue is not occurring. I’m still seeing the odd behaviour of some of the devices in vdev1 are spiking while none in vdev2 are, but not as much, not as consistently, and performance is not impacted.

I’m going to watch it for a few days and see if the issue comes back. If I were to decide to recreate the pool using my intel drives and my actual HBA, what would my optimal configuration for a 13 TB total, 12 x 2 TB drive share? Would I want to recreate it as a raidz2, or is some other level of RAID preferable? I do have a full backup instance running (netbackup with separate storage and dedup), but I’d prefer to have a level of raid so if a disk goes bad I have time to get around to replacing it without doing a 6 TB restore….

Sounds like SLOG is not needed, as I no longer have block storage, but would I want to use L2ARC? Would there be an advantage to adding a vdev for Metadata?

I would guess you are fine without L2ARC. You can look at the arc_summary statistics and see what you hit rate is along with the other stats. You can add and remove L2ARC from a pool without having to destroy and recreate the pool, unlike a sVDEV (metadata). The L2ARC can be tuned a bit for metadata.

Take a look at the pool layout whitepaper for the trade off on the different layouts. The sVDEV info may help you a bit, too.
TrueNAS Systems pool layout whitepaper
White Papers | TrueNAS - Open Enterprise Storage ZFS Storage Pool Layout
Special VDEV (sVDEV) Planning, Sizing, and Considerations
Special VDEV (sVDEV) Planning, Sizing, and Considerations

1 Like