Very pool write performance on HDD pool

Harsesis · January 2, 2025, 4:34pm

Hello all,

Let me start my post with the technical facts:

Technical Facts:

Proxmos Host:

Xeon E5-1650 v4
128GB ECC Memory
10 Gig Interl Networking

TrueNAS VM

TrueNAS-13.0-U6.4
8 Cores assigned
32 GB memory

Pools:

Pool 1:
- 6x WD Red 12 TB (WD120EFAX)
- 1 vdev as RaidZ2
- Legacy encryption
- Atime off
- Deduplication off
- Compression off
- Record size 1M
Pool 2:
- 2x Crucial MX 500 1TB SSDs (CT1000MX500SSD1)
- mirrored
- Atime off
- Deduplication off
- Compression lz4
- Record size 128 KiB

The Problem:
In the past the performance of Pool 1 was fine, but lately the write performance is very poor. When writing a large file to the pool the transfer rate drops from initially over 900 MB/s to only 20-40 MB/s. When copying data to pool 2 instead (same machine, same file etc.) the transfer rate also drops from initially over 900 MB/s to a very steady rate above 400 MB/s. Read speeds from pool 1 reach are also not great, but much better than the write speeds with 300-400 MB/s on average.

I checked the cpu load and memory consumption during a transfer and I could not see any peeks that would indicate a problem in that area. I also assume that the network setup is fine, as I reach much better performance on pool 2 that uses the same host / hardware / network.

What could cause these performance issues?

Davvo · January 2, 2025, 7:12pm

Beyond TN being Virtualized or the Legacy Encryption (I didn’t realize you couild still run that on CORE 13 btw) you mean?

How full is pool1?

Harsesis · January 2, 2025, 7:48pm

Yeah, I suspected something like that would come up ;). But I am running the setup like this for quite some time now, and in the past I was not facing these issues.

Pool 1 is currently sitting at 72% usage.

Davvo · January 2, 2025, 8:20pm

Any recent update/change? What’s the fragmentation rate of pool1? How are the drives connected?

Harsesis · January 2, 2025, 8:29pm

I am not fully aware when the issue popped up which makes it difficult to pinpoint if something changed. I noticed the issue some weeks ago but did not have time to follow up on it.

The fragmentation rate seems to be 7% (First time I am learning about this). The drives are connected via SATA to a LSI SAS 2008 controller that is handed over to TN via PCI passthrough in Proxmox.

Davvo · January 2, 2025, 8:31pm

Take a look at 2MB/sec SMB transfer speed to TrueNAS Scale Storage - #38 by HoneyBadger and below.

Harsesis · January 2, 2025, 9:30pm

Your assumption is that my poor performance is because of disabled write caches? I cannot check it using hdparm as it is not available in TN CORE. Is the expectation that it should be enabled?

Would it be fine to be enabled? I have a UPS, so I do not expect any sudden power cuts, but I am not sure about the general recommendation.

Is there also a tool that I can use to check it on CORE?

Edit: I checked my drives via camcontrol identify and all of them show write cache supported and enabled

Davvo · January 2, 2025, 10:35pm

I am trying to exclude the things that can be quickly identified and fixed first. Yes, it should be enabled.

Please cd into /mnt/pool1 and run the following, then post the output.

fio --name TEST --eta-newline=5s --ioengine=posixaio --rw=write --size=150g --io_size=650g --blocksize=128k --iodepth=16  --numjobs=8 
--runtime=120 --group_reporting

After that, remember to delete the spawned file in order to avoid wasting space.

winnielinnie · January 2, 2025, 10:51pm

Maybe it’s a combination of spinning HDDs + single 6-wide RAIDZ2 vdev + 72% allocated space + 7% fragmented free space?

Davvo · January 2, 2025, 10:54pm

From 900 MB/s to 40 MB/s for large files is imho a pretty harsh performance decrease for 72% allocated space and that insignificant bit of free space fragmentation. If I were to guess there is something else at work there.

Gyula_Masa · January 3, 2025, 9:10am

Maybe, first you should try to test network speed with RAM disk. That helps you to identify, if your bottleneck is actually pool1.
Since, pool2 also suffers from the phenomena, It might be a network issue too.
Also, check your HDD SMART data. There might be a HDD on a brink of death and this is just a symptom of this process.
And finally check your sATA cables. They can “age” and just getting wacky contact.

Harsesis · January 3, 2025, 10:52am

I tried running fio, but this will take ages because of the poor write speed.

fio --name TEST --eta-newline=5s --ioengine=posixaio --rw=write --size=150g --io_size=650g --blocksize=128k --iodepth=16  --numjobs=8
=120 --group_reportingTEST: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=16
...
fio-3.28
Starting 8 processes
TEST: Laying out IO file (1 file / 153600MiB)
TEST: Laying out IO file (1 file / 153600MiB)
TEST: Laying out IO file (1 file / 153600MiB)
TEST: Laying out IO file (1 file / 153600MiB)
TEST: Laying out IO file (1 file / 153600MiB)
TEST: Laying out IO file (1 file / 153600MiB)
TEST: Laying out IO file (1 file / 153600MiB)
TEST: Laying out IO file (1 file / 153600MiB)
Jobs: 8 (f=8): [W(8)][0.1%][w=71.9MiB/s][w=575 IOPS][eta 03h:31m:37s]
Jobs: 8 (f=8): [W(8)][0.1%][w=34.4MiB/s][w=275 IOPS][eta 05h:35m:06s]
Jobs: 8 (f=8): [W(8)][0.1%][w=25.9MiB/s][w=206 IOPS][eta 07h:31m:41s]
Jobs: 8 (f=8): [W(8)][0.1%][w=23.1MiB/s][w=184 IOPS][eta 09h:20m:40s]
Jobs: 8 (f=8): [W(8)][0.1%][w=37.3MiB/s][w=298 IOPS][eta 10h:50m:30s]
Jobs: 8 (f=8): [W(8)][0.1%][w=35.9MiB/s][w=287 IOPS][eta 12h:15m:22s]
Jobs: 8 (f=8): [W(8)][0.1%][w=32.1MiB/s][w=257 IOPS][eta 13h:35m:11s]
Jobs: 8 (f=8): [W(8)][0.1%][w=35.3MiB/s][w=282 IOPS][eta 14h:46m:46s]
Jobs: 8 (f=8): [W(8)][0.1%][w=34.5MiB/s][w=275 IOPS][eta 16h:12m:52s]
Jobs: 8 (f=8): [W(8)][0.1%][w=33.7MiB/s][w=269 IOPS][eta 17h:13m:19s]
Jobs: 8 (f=8): [W(8)][0.1%][w=31.7MiB/s][w=253 IOPS][eta 18h:08m:38s]
Jobs: 8 (f=8): [W(8)][0.1%][w=35.3MiB/s][w=282 IOPS][eta 18h:59m:44s]
Jobs: 8 (f=8): [W(8)][0.1%][w=32.2MiB/s][w=257 IOPS][eta 19h:48m:55s]
Jobs: 8 (f=8): [W(8)][0.1%][w=34.7MiB/s][w=277 IOPS][eta 20h:33m:43s]
Jobs: 8 (f=8): [W(8)][0.1%][w=29.3MiB/s][w=234 IOPS][eta 21h:18m:42s]
Jobs: 8 (f=8): [W(8)][0.1%][w=32.5MiB/s][w=259 IOPS][eta 22h:17m:41s]
Jobs: 8 (f=8): [W(8)][0.1%][w=32.1MiB/s][w=256 IOPS][eta 23h:03m:57s]
Jobs: 8 (f=8): [W(8)][0.1%][w=34.3MiB/s][w=274 IOPS][eta 23h:38m:57s]
Jobs: 8 (f=8): [W(8)][0.1%][w=32.1MiB/s][w=257 IOPS][eta 01d:00h:12m:43s]
Jobs: 8 (f=8): [W(8)][0.1%][w=34.7MiB/s][w=277 IOPS][eta 01d:00h:44m:40s]
Jobs: 8 (f=8): [W(8)][0.1%][w=25.5MiB/s][w=204 IOPS][eta 01d:01h:17m:05s]
Jobs: 8 (f=8): [W(8)][0.1%][w=32.4MiB/s][w=259 IOPS][eta 01d:01h:48m:39s]
Jobs: 8 (f=8): [W(8)][0.1%][w=42.3MiB/s][w=338 IOPS][eta 01d:02h:17m:21s]
Jobs: 8 (f=8): [W(8)][0.1%][w=35.9MiB/s][w=287 IOPS][eta 01d:02h:48m:27s]
Jobs: 8 (f=8): [W(8)][0.1%][w=35.3MiB/s][w=282 IOPS][eta 01d:03h:10m:53s]
Jobs: 8 (f=8): [W(8)][0.1%][w=33.5MiB/s][w=267 IOPS][eta 01d:03h:32m:40s]
Jobs: 8 (f=8): [W(8)][0.1%][w=34.1MiB/s][w=272 IOPS][eta 01d:03h:53m:41s]
Jobs: 8 (f=8): [W(8)][0.1%][w=33.0MiB/s][w=264 IOPS][eta 01d:04h:11m:57s]
Jobs: 8 (f=8): [W(8)][0.1%][w=34.1MiB/s][w=272 IOPS][eta 01d:04h:32m:26s]
Jobs: 8 (f=8): [W(8)][0.1%][w=30.6MiB/s][w=245 IOPS][eta 01d:05h:03m:50s]
Jobs: 8 (f=8): [W(8)][0.2%][w=33.6MiB/s][w=268 IOPS][eta 01d:05h:23m:30s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.7MiB/s][w=261 IOPS][eta 01d:05h:39m:11s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.0MiB/s][w=256 IOPS][eta 01d:05h:58m:53s]
Jobs: 8 (f=8): [W(8)][0.2%][w=33.1MiB/s][w=264 IOPS][eta 01d:06h:14m:04s]
Jobs: 8 (f=8): [W(8)][0.2%][w=28.5MiB/s][w=228 IOPS][eta 01d:06h:34m:17s]
Jobs: 8 (f=8): [W(8)][0.2%][w=30.6MiB/s][w=244 IOPS][eta 01d:06h:51m:08s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.0MiB/s][w=255 IOPS][eta 01d:07h:16m:43s]
Jobs: 8 (f=8): [W(8)][0.2%][w=31.5MiB/s][w=252 IOPS][eta 01d:07h:29m:11s]
Jobs: 8 (f=8): [W(8)][0.2%][w=33.6MiB/s][w=269 IOPS][eta 01d:07h:43m:01s]
Jobs: 8 (f=8): [W(8)][0.2%][w=30.9MiB/s][w=247 IOPS][eta 01d:07h:55m:22s]
Jobs: 8 (f=8): [W(8)][0.2%][w=33.8MiB/s][w=270 IOPS][eta 01d:08h:07m:06s]
Jobs: 8 (f=8): [W(8)][0.2%][w=39.3MiB/s][w=314 IOPS][eta 01d:08h:19m:12s]
Jobs: 8 (f=8): [W(8)][0.2%][w=36.4MiB/s][w=291 IOPS][eta 01d:08h:26m:39s]
Jobs: 8 (f=8): [W(8)][0.2%][w=33.3MiB/s][w=266 IOPS][eta 01d:08h:36m:30s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.8MiB/s][w=262 IOPS][eta 01d:08h:54m:26s]
Jobs: 8 (f=8): [W(8)][0.2%][w=31.9MiB/s][w=255 IOPS][eta 01d:09h:04m:26s]
Jobs: 8 (f=8): [W(8)][0.2%][w=34.2MiB/s][w=273 IOPS][eta 01d:09h:13m:33s]
Jobs: 8 (f=8): [W(8)][0.2%][w=31.2MiB/s][w=250 IOPS][eta 01d:09h:23m:39s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.8MiB/s][w=262 IOPS][eta 01d:09h:34m:43s]
Jobs: 8 (f=8): [W(8)][0.2%][w=29.2MiB/s][w=233 IOPS][eta 01d:09h:47m:37s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.4MiB/s][w=258 IOPS][eta 01d:09h:58m:25s]
Jobs: 8 (f=8): [W(8)][0.2%][w=31.6MiB/s][w=252 IOPS][eta 01d:10h:16m:42s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.4MiB/s][w=259 IOPS][eta 01d:10h:23m:39s]
Jobs: 8 (f=8): [W(8)][0.2%][w=29.8MiB/s][w=238 IOPS][eta 01d:10h:36m:19s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.6MiB/s][w=260 IOPS][eta 01d:10h:44m:52s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.2MiB/s][w=257 IOPS][eta 01d:10h:53m:01s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.5MiB/s][w=259 IOPS][eta 01d:10h:58m:55s]
Jobs: 8 (f=8): [W(8)][0.2%][w=32.7MiB/s][w=261 IOPS][eta 01d:11h:06m:40s]
Jobs: 8 (f=8): [W(8)][0.2%][w=34.0MiB/s][w=271 IOPS][eta 01d:11h:19m:39s]
Jobs: 8 (f=8): [W(8)][0.2%][w=34.6MiB/s][w=276 IOPS][eta 01d:11h:25m:56s]
Jobs: 8 (f=8): [W(8)][0.2%][w=39.4MiB/s][w=315 IOPS][eta 01d:11h:35m:00s]
Jobs: 8 (f=8): [W(8)][0.2%][w=34.0MiB/s][w=272 IOPS][eta 01d:11h:38m:59s]
^Cbs: 8 (f=8): [W(8)][0.3%][w=33.3MiB/s][w=266 IOPS][eta 01d:11h:43m:31s]
fio: terminating on signal 2

TEST: (groupid=0, jobs=1): err= 0: pid=33560: Fri Jan  3 11:37:43 2025
  write: IOPS=42, BW=5445KiB/s (5576kB/s)(1735MiB/326208msec); 0 zone resets
    slat (nsec): min=1878, max=18040k, avg=31700.67, stdev=212713.47
    clat (usec): min=7, max=908729, avg=375868.70, stdev=205608.47
     lat (usec): min=37, max=908740, avg=375900.40, stdev=205568.72
    clat percentiles (usec):
     |  1.00th=[    45],  5.00th=[   457], 10.00th=[  1123], 20.00th=[ 67634],
     | 30.00th=[425722], 40.00th=[455082], 50.00th=[467665], 60.00th=[476054],
     | 70.00th=[488637], 80.00th=[505414], 90.00th=[534774], 95.00th=[574620],
     | 99.00th=[700449], 99.50th=[742392], 99.90th=[843056], 99.95th=[868221],
     | 99.99th=[901776]
   bw (  KiB/s): min= 1709, max=587474, per=12.63%, avg=5479.81, stdev=23506.80, samples=634
   iops        : min=   13, max= 4589, avg=42.36, stdev=183.64, samples=634
  lat (usec)   : 10=0.01%, 20=0.65%, 50=0.41%, 100=0.87%, 250=1.98%
  lat (usec)   : 500=1.38%, 750=2.31%, 1000=1.70%
  lat (msec)   : 2=3.70%, 4=2.62%, 10=1.18%, 20=0.84%, 50=1.48%
  lat (msec)   : 100=2.15%, 250=2.93%, 500=53.27%, 750=22.10%, 1000=0.40%
  cpu          : usr=0.05%, sys=0.06%, ctx=11780, majf=0, minf=2
  IO depths    : 1=0.8%, 2=1.8%, 4=3.9%, 8=50.5%, 16=43.1%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.2%, 8=1.1%, 16=0.7%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13877,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
TEST: (groupid=0, jobs=1): err= 0: pid=33561: Fri Jan  3 11:37:43 2025
  write: IOPS=42, BW=5425KiB/s (5555kB/s)(1728MiB/326245msec); 0 zone resets
    slat (nsec): min=1419, max=26742k, avg=35578.69, stdev=308227.36
    clat (usec): min=6, max=1180.9k, avg=377177.90, stdev=205920.79
     lat (usec): min=36, max=1180.9k, avg=377213.48, stdev=205878.19
    clat percentiles (usec):
     |  1.00th=[     44],  5.00th=[    482], 10.00th=[   1270],
     | 20.00th=[  72877], 30.00th=[ 429917], 40.00th=[ 455082],
     | 50.00th=[ 467665], 60.00th=[ 476054], 70.00th=[ 488637],
     | 80.00th=[ 505414], 90.00th=[ 534774], 95.00th=[ 583009],
     | 99.00th=[ 692061], 99.50th=[ 759170], 99.90th=[ 935330],
     | 99.95th=[1098908], 99.99th=[1182794]
   bw (  KiB/s): min= 1259, max=590788, per=12.57%, avg=5452.27, stdev=23544.08, samples=635
   iops        : min=    9, max= 4615, avg=42.13, stdev=183.93, samples=635
  lat (usec)   : 10=0.04%, 20=0.64%, 50=0.48%, 100=1.02%, 250=1.28%
  lat (usec)   : 500=1.79%, 750=2.17%, 1000=1.43%
  lat (msec)   : 2=3.42%, 4=2.94%, 10=1.56%, 20=0.71%, 50=1.61%
  lat (msec)   : 100=2.05%, 250=2.90%, 500=53.61%, 750=21.78%, 1000=0.51%
  lat (msec)   : 2000=0.06%
  cpu          : usr=0.05%, sys=0.05%, ctx=11662, majf=1, minf=1
  IO depths    : 1=0.7%, 2=1.7%, 4=3.7%, 8=50.5%, 16=43.4%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.2%, 8=1.1%, 16=0.7%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13827,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
TEST: (groupid=0, jobs=1): err= 0: pid=33562: Fri Jan  3 11:37:43 2025
  write: IOPS=43, BW=5507KiB/s (5640kB/s)(1754MiB/326190msec); 0 zone resets
    slat (usec): min=2, max=10846, avg=36.80, stdev=210.12
    clat (usec): min=6, max=1482.8k, avg=371580.60, stdev=208938.61
     lat (usec): min=36, max=1482.8k, avg=371617.39, stdev=208891.37
    clat percentiles (usec):
     |  1.00th=[     41],  5.00th=[    445], 10.00th=[   1106],
     | 20.00th=[  46924], 30.00th=[ 421528], 40.00th=[ 455082],
     | 50.00th=[ 467665], 60.00th=[ 476054], 70.00th=[ 488637],
     | 80.00th=[ 501220], 90.00th=[ 530580], 95.00th=[ 574620],
     | 99.00th=[ 692061], 99.50th=[ 767558], 99.90th=[ 968885],
     | 99.95th=[1061159], 99.99th=[1166017]
   bw (  KiB/s): min=  732, max=640509, per=12.78%, avg=5543.87, stdev=25515.05, samples=634
   iops        : min=    5, max= 5003, avg=42.85, stdev=199.31, samples=634
  lat (usec)   : 10=0.02%, 20=0.76%, 50=0.50%, 100=0.89%, 250=1.76%
  lat (usec)   : 500=1.42%, 750=1.92%, 1000=1.98%
  lat (msec)   : 2=4.46%, 4=2.68%, 10=1.42%, 20=0.88%, 50=1.45%
  lat (msec)   : 100=2.11%, 250=2.84%, 500=53.28%, 750=21.08%, 1000=0.46%
  lat (msec)   : 2000=0.10%
  cpu          : usr=0.05%, sys=0.05%, ctx=11868, majf=0, minf=2
  IO depths    : 1=0.8%, 2=2.0%, 4=4.2%, 8=50.3%, 16=42.6%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.0%, 8=1.2%, 16=0.8%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,14035,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
TEST: (groupid=0, jobs=1): err= 0: pid=33563: Fri Jan  3 11:37:43 2025
  write: IOPS=42, BW=5394KiB/s (5523kB/s)(1719MiB/326244msec); 0 zone resets
    slat (usec): min=2, max=10798, avg=32.99, stdev=183.70
    clat (usec): min=6, max=1056.3k, avg=379427.46, stdev=204036.37
     lat (usec): min=38, max=1056.3k, avg=379460.45, stdev=203993.42
    clat percentiles (usec):
     |  1.00th=[     46],  5.00th=[    523], 10.00th=[   1385],
     | 20.00th=[  88605], 30.00th=[ 434111], 40.00th=[ 455082],
     | 50.00th=[ 467665], 60.00th=[ 476054], 70.00th=[ 488637],
     | 80.00th=[ 505414], 90.00th=[ 534774], 95.00th=[ 574620],
     | 99.00th=[ 692061], 99.50th=[ 775947], 99.90th=[ 901776],
     | 99.95th=[1002439], 99.99th=[1027605]
   bw (  KiB/s): min= 1007, max=561211, per=12.50%, avg=5420.09, stdev=22421.58, samples=635
   iops        : min=    7, max= 4384, avg=41.88, stdev=175.17, samples=635
  lat (usec)   : 10=0.01%, 20=0.60%, 50=0.43%, 100=0.89%, 250=1.59%
  lat (usec)   : 500=1.35%, 750=2.08%, 1000=1.28%
  lat (msec)   : 2=3.51%, 4=2.99%, 10=1.38%, 20=0.86%, 50=1.58%
  lat (msec)   : 100=2.26%, 250=2.75%, 500=53.55%, 750=22.26%, 1000=0.58%
  lat (msec)   : 2000=0.05%
  cpu          : usr=0.04%, sys=0.06%, ctx=11641, majf=0, minf=2
  IO depths    : 1=0.7%, 2=1.7%, 4=3.6%, 8=50.5%, 16=43.5%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.2%, 8=1.2%, 16=0.6%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13748,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
TEST: (groupid=0, jobs=1): err= 0: pid=33564: Fri Jan  3 11:37:43 2025
  write: IOPS=42, BW=5393KiB/s (5522kB/s)(1718MiB/326244msec); 0 zone resets
    slat (nsec): min=1941, max=15733k, avg=34181.35, stdev=223355.55
    clat (usec): min=7, max=1072.8k, avg=379445.93, stdev=202553.81
     lat (usec): min=33, max=1072.8k, avg=379480.11, stdev=202508.21
    clat percentiles (usec):
     |  1.00th=[    47],  5.00th=[   502], 10.00th=[  1467], 20.00th=[ 92799],
     | 30.00th=[429917], 40.00th=[455082], 50.00th=[467665], 60.00th=[476054],
     | 70.00th=[488637], 80.00th=[505414], 90.00th=[534774], 95.00th=[574620],
     | 99.00th=[675283], 99.50th=[742392], 99.90th=[826278], 99.95th=[868221],
     | 99.99th=[910164]
   bw (  KiB/s): min= 1259, max=566055, per=12.49%, avg=5419.48, stdev=22559.28, samples=635
   iops        : min=    9, max= 4422, avg=41.89, stdev=176.25, samples=635
  lat (usec)   : 10=0.02%, 20=0.59%, 50=0.47%, 100=0.90%, 250=1.79%
  lat (usec)   : 500=1.23%, 750=1.59%, 1000=1.43%
  lat (msec)   : 2=3.43%, 4=3.19%, 10=1.47%, 20=0.72%, 50=1.67%
  lat (msec)   : 100=1.91%, 250=3.05%, 500=53.65%, 750=22.41%, 1000=0.49%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.04%, sys=0.07%, ctx=11865, majf=0, minf=2
  IO depths    : 1=0.7%, 2=1.6%, 4=3.6%, 8=50.3%, 16=43.8%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.3%, 8=1.1%, 16=0.7%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13745,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
TEST: (groupid=0, jobs=1): err= 0: pid=33565: Fri Jan  3 11:37:43 2025
  write: IOPS=43, BW=5528KiB/s (5661kB/s)(1762MiB/326278msec); 0 zone resets
    slat (nsec): min=1899, max=12111k, avg=35173.37, stdev=215307.27
    clat (usec): min=8, max=936097, avg=370148.08, stdev=209436.02
     lat (usec): min=34, max=936107, avg=370183.26, stdev=209391.60
    clat percentiles (usec):
     |  1.00th=[    40],  5.00th=[   233], 10.00th=[   832], 20.00th=[ 40109],
     | 30.00th=[417334], 40.00th=[455082], 50.00th=[467665], 60.00th=[476054],
     | 70.00th=[488637], 80.00th=[505414], 90.00th=[534774], 95.00th=[574620],
     | 99.00th=[692061], 99.50th=[750781], 99.90th=[851444], 99.95th=[868221],
     | 99.99th=[935330]
   bw (  KiB/s): min=  503, max=654023, per=12.81%, avg=5558.01, stdev=26031.61, samples=635
   iops        : min=    3, max= 5109, avg=42.95, stdev=203.37, samples=635
  lat (usec)   : 10=0.02%, 20=0.75%, 50=0.63%, 100=1.09%, 250=2.70%
  lat (usec)   : 500=1.88%, 750=2.33%, 1000=1.94%
  lat (msec)   : 2=3.18%, 4=2.23%, 10=1.30%, 20=0.99%, 50=1.48%
  lat (msec)   : 100=2.02%, 250=2.89%, 500=52.40%, 750=21.69%, 1000=0.50%
  cpu          : usr=0.06%, sys=0.05%, ctx=12269, majf=0, minf=2
  IO depths    : 1=0.9%, 2=2.0%, 4=4.1%, 8=50.3%, 16=42.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.2%, 8=1.1%, 16=0.8%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,14092,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
TEST: (groupid=0, jobs=1): err= 0: pid=33566: Fri Jan  3 11:37:43 2025
  write: IOPS=42, BW=5395KiB/s (5525kB/s)(1719MiB/326278msec); 0 zone resets
    slat (usec): min=2, max=43318, avg=36.57, stdev=401.73
    clat (usec): min=5, max=898588, avg=379206.65, stdev=202944.16
     lat (usec): min=35, max=898593, avg=379243.22, stdev=202895.42
    clat percentiles (usec):
     |  1.00th=[    46],  5.00th=[   424], 10.00th=[  1401], 20.00th=[ 93848],
     | 30.00th=[429917], 40.00th=[455082], 50.00th=[467665], 60.00th=[476054],
     | 70.00th=[488637], 80.00th=[505414], 90.00th=[534774], 95.00th=[574620],
     | 99.00th=[700449], 99.50th=[742392], 99.90th=[834667], 99.95th=[876610],
     | 99.99th=[901776]
   bw (  KiB/s): min= 1462, max=566055, per=12.50%, avg=5422.80, stdev=22570.22, samples=635
   iops        : min=   11, max= 4422, avg=41.89, stdev=176.34, samples=635
  lat (usec)   : 10=0.02%, 20=0.69%, 50=0.39%, 100=0.98%, 250=2.04%
  lat (usec)   : 500=1.34%, 750=1.93%, 1000=1.17%
  lat (msec)   : 2=3.32%, 4=2.78%, 10=1.41%, 20=0.85%, 50=1.53%
  lat (msec)   : 100=2.05%, 250=2.97%, 500=54.02%, 750=22.03%, 1000=0.47%
  cpu          : usr=0.06%, sys=0.05%, ctx=11785, majf=0, minf=2
  IO depths    : 1=0.7%, 2=1.7%, 4=3.7%, 8=50.1%, 16=43.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.0%, 8=1.3%, 16=0.7%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13753,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
TEST: (groupid=0, jobs=1): err= 0: pid=33567: Fri Jan  3 11:37:43 2025
  write: IOPS=41, BW=5289KiB/s (5416kB/s)(1685MiB/326244msec); 0 zone resets
    slat (usec): min=2, max=4449, avg=34.52, stdev=180.29
    clat (usec): min=10, max=904862, avg=386948.94, stdev=196969.83
     lat (usec): min=36, max=904867, avg=386983.45, stdev=196921.43
    clat percentiles (usec):
     |  1.00th=[    61],  5.00th=[   725], 10.00th=[  2311], 20.00th=[119014],
     | 30.00th=[438305], 40.00th=[455082], 50.00th=[467665], 60.00th=[480248],
     | 70.00th=[488637], 80.00th=[505414], 90.00th=[534774], 95.00th=[574620],
     | 99.00th=[683672], 99.50th=[742392], 99.90th=[834667], 99.95th=[884999],
     | 99.99th=[901776]
   bw (  KiB/s): min=  755, max=498996, per=12.25%, avg=5311.77, stdev=19933.19, samples=635
   iops        : min=    5, max= 3898, avg=41.04, stdev=155.73, samples=635
  lat (usec)   : 20=0.54%, 50=0.30%, 100=0.60%, 250=1.07%, 500=1.08%
  lat (usec)   : 750=1.53%, 1000=1.22%
  lat (msec)   : 2=3.06%, 4=2.82%, 10=1.98%, 20=0.93%, 50=1.60%
  lat (msec)   : 100=2.11%, 250=3.04%, 500=55.00%, 750=22.66%, 1000=0.45%
  cpu          : usr=0.05%, sys=0.06%, ctx=11654, majf=0, minf=2
  IO depths    : 1=0.6%, 2=1.4%, 4=3.4%, 8=49.9%, 16=44.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.3%, 8=1.1%, 16=0.6%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13481,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=42.4MiB/s (44.4MB/s), 5289KiB/s-5528KiB/s (5416kB/s-5661kB/s), io=13.5GiB (14.5GB), run=326190-326278msec

I think it is not worth letting this run for over a day. But since the performance hit is visible immediately using fio I guess we can for now assume the issue is not related to the network?

I also checked the smart data, and I think there we might be onto something. One of the drives reported this:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   128   128   054    Old_age   Offline      -       108
  3 Spin_Up_Time            0x0007   158   158   024    Pre-fail  Always       -       417 (Average 417)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       1487
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       4
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   140   140   020    Old_age   Offline      -       15
  9 Power_On_Hours          0x0012   094   094   000    Old_age   Always       -       44495
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
 22 Unknown_Attribute       0x0023   004   004   025    Pre-fail  Always   FAILING_NOW 55
192 Power-Off_Retract_Count 0x0032   090   090   000    Old_age   Always       -       12962
193 Load_Cycle_Count        0x0012   090   090   000    Old_age   Always       -       12962
194 Temperature_Celsius     0x0002   232   232   000    Old_age   Always       -       28 (Min/Max 20/40)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       4
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 10519 hours (438 days + 7 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 50 6a e9 40 00  17d+08:27:51.884  READ FPDMA QUEUED
  60 00 00 50 69 e9 40 00  17d+08:27:51.882  READ FPDMA QUEUED
  60 00 00 50 68 e9 40 00  17d+08:27:51.881  READ FPDMA QUEUED
  60 40 00 10 68 e9 40 00  17d+08:27:51.881  READ FPDMA QUEUED
  60 40 00 d0 67 e9 40 00  17d+08:27:51.881  READ FPDMA QUEUED

I don’t like the reallocated even count and I have no idea what attribute 22 is, but for all other drives it has a raw value of 100 instead of 55.

So I guess I first should look into replacing that drive. I am a bit surprised this has not been reported, even though I have regular smart tests enabled.

Davvo · January 3, 2025, 12:09pm

Attribuite 22 is the helium level. I suggest running Multi-Report to avoid future issues.

Harsesis · January 3, 2025, 12:23pm

Thanks! I will look into multi-Report. I ordered a new drive, lets see if this will fix the issue. Will take some time until you will hear back from me because of initial burn-in etc

Thanks already, lets hope this will resolve my issue!

Harsesis · January 11, 2025, 1:34pm

So I got an update. While waiting for my replacement disk, I got a mail from my TN server telling me that one .geli device is slowing down my array. I double checked via glabel status and it was indeed the same disk that I already suspected going bad. I now just finished burn-in of my new replacement drive and started the replacement process using the UI.

I am now a little bit worried as the shown speed and remaining times are horrible:

pool: pool_1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jan 11 14:25:23 2025
        63.2G scanned at 185M/s, 1.41G issued at 4.13M/s, 46.3T total
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                                  STATE     READ WRITE CKSUM
        pool_media                                            ONLINE       0     0     0
          raidz2-0                                            ONLINE       0     0     0
            gptid/565f8d23-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            gptid/56ed1a86-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            replacing-2                                       ONLINE       0     0     0
              gptid/57760dce-1e78-11ea-bfde-0cc47a688a52.eli  ONLINE       0     0     0
              gptid/7b90c30d-d01f-11ef-ad29-51fda28862e8.eli  ONLINE       0     0     0
            gptid/57f20fa0-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            gptid/587efb38-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            gptid/591283c8-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0

errors: No known data errors

The GUI is showing me over 130 days to complete the resilver process… what could be going on here?

During the burn-in one badblock run for the new 16TB drive took about 16 hours, which I think should be fine. Is the system trying to read data from the drive to be replaced during the re-silver which is again potentially slowing down the process? Could it be faster to just drop that one drive and rebuild the stuff from the remaining parity?

Harsesis · January 11, 2025, 2:02pm

Ok, no idea what was going on before, but now the numbers came down/up big time. Not sure what TN was doing initially but now numbers look better and are falling constantly:

pool: pool_1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jan 11 14:25:23 2025
        1.97T scanned at 970M/s, 481G issued at 231M/s, 46.3T total
        77.8G resilvered, 1.01% done, 2 days 09:56:20 to go
config:

        NAME                                                  STATE     READ WRITE CKSUM
        pool_media                                            ONLINE       0     0     0
          raidz2-0                                            ONLINE       0     0     0
            gptid/565f8d23-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            gptid/56ed1a86-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            replacing-2                                       ONLINE       0     0     0
              gptid/57760dce-1e78-11ea-bfde-0cc47a688a52.eli  ONLINE       0     0     0
              gptid/7b90c30d-d01f-11ef-ad29-51fda28862e8.eli  ONLINE       0     0     0  (resilvering)
            gptid/57f20fa0-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            gptid/587efb38-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0
            gptid/591283c8-1e78-11ea-bfde-0cc47a688a52.eli    ONLINE       0     0     0

I just noticed it did not yet actaully say “resilvering” right of the new drive in my first zpool status.

winnielinnie · January 11, 2025, 2:12pm

I believe it’s a two-part process.

The first part “assesses” what to do, which I believe it readies itself to optimally work as “sequentially” as possible across the drives. (Less stress, better performance.)

The second part is the actual resilver.

It’s this “divide by zero” chicanery that throws off the GUI’s ETA calculation.

You might find this funny:

Here is what I had to stare at when I was resilvering a mirror that only contained 7 TiB worth of data.

Time Remaining: 7167064838 days, 6 hours, 11 minutes, 12 seconds

Harsesis · January 11, 2025, 2:13pm

Oof, yeah that is a scary big number

My number is now down to below 2 days. More what I expected

winnielinnie · January 11, 2025, 2:14pm

The GUI is trying to “divide by zero”, that’s why.

winnielinnie · January 11, 2025, 2:18pm

Just know that a “scrub” will probably automatically trigger upon completion of the resilver. So don’t plan to power off your system just yet.