Long resilver time TL;DR

tuxhobo · December 31, 2024, 1:25am

Apologies in advance for the length of this post. There is a lot to digest about the question that is posed at end about why the resilver time of a single drive is taking so long. It may also be a cautionary tail about purchasing equipment harvested from enterprise servers.

I purchased four 10TB HDDs from eBay describe as:
“HGST 10TB 3.5” SAS HDD 7200 RPM HUH721010AL5204, P/N: 0F27385 GRADE A"
These are labeled as Advanced Format (AF) meaning sector size greater than 512 bytes

The intention for this purchase is to increase the size of a RAIDZ1 ZFS pool containing 4 x 4TB SAS HDDs (logical sector size 512). This pool has been running for a couple of years now and is reaching 75% full and needs expansion.

I had an existing Seagate 10TB 4kn drive and I resilvered the first drive successfully while waiting for the new drives to arrive. The resilver was an over night success completing in about 8 to 10 hours. I did not track this time, but it established the expectation for completion time.

When the new drives arrived, TrueNAS Scale did not recognize them and reported zero size. The dmesg report: cannot recognize 520 byte sector size.

Smartctl reports the drives as:
NETAPP X377_HLBRE10TA07

Which are harvested from a proprietary NAS. The HGST label was still attached, but they are internally reporting the NETAPP part number.

This post describes a simple method for reformatting.

sudo sg_format --size=4096 /dev/<your device>

I reformatted the fist drive to a 4K logical sector size and it resilvered in about the same time as the Seagate drive. The second reformatted drive completed in about the same time.

At this point, three of four 4TB HDDs have been resilvered with 10TB drives. The pool reported as normal but mixed size (expected). Now the last replacement was started and as I write this the resilver has been running for about 30 hours and is 36% complete.

I do not want to interrupt this process, because no errors are reported by zpool status. But it does seem unusual.
Does anyone have thoughts about why this last drive resilver is taking so long?

Description of the TrueNAS installation:

Dragonfish-24.04.2
Dell R720 w/ single Intel(R) Xeon(R) E5-2630
96G ECC RAM
Dell Perc H310 mini controller w/ IT firmware installed

Oh, yes I do have a backup of my data just in case the very worst happens.

Captain_Morgan · December 31, 2024, 5:17am

Not sure why, but:

How long did each of the other resilver replacements take?

You may want to look at drive statistics… is there one drive that seems to be struggling?

I’d be looking at latency, % busy and pending IO/s

tuxhobo · December 31, 2024, 3:30pm

Thank you for the feedback!
The first three replacement drives all resilvered in about the same time but I did not track them precisely. For two of them the replacement started in the evening and was complete in the morning (8 to10 hours) One of them was started in the morning and was done in the afternoon (again about 8 to10 hours) It is the last one that is still running (now about 40 + hours and counting). It continues to proceed without error. At this time, zpool status is reporting 63% complete and 20 hours remaining to finish.

The disk I/O status results of the zpool drives: (sdc is being replaced by sdf)


		meanrd	meanwr	reread	rewrite
/dev/sda	9.1TiB	14MB/s	52KB/s	1121433	0
/dev/sde	9.1TiB	14MB/s	55KB/s	99	0
/dev/sdh	9.1TiB	13MB/s	52KB/s	0	0
/dev/sdc	3.64TiB	14MB/s	55KB/s	0	0
/dev/sdf	9.1TiB	1.2KiB/s	13.4 MB/s	304108	45197

The health status returned by smartctl for all the drives is “OK” and he uncorrected error count and the grown defect list is zero. This indicates to me that the media is not contributing to the trouble. The Seagate drive is /dev/sdh (native 4kn). All the other 10TB drives are the HGST drives reformatted to 4k sector size. The last 4TB drive is HGST and is native 512 sector size.

The mean read and mean write values come from the TrueNAS reporting page. These results do not seem unusual. The substantially greater mean write result for sdf (multiple orders of magnitude) feels OK to me because this is the replacement drive that is resilvering. (lots of writes to get the job done) and it is about the same as the mean read rate of the other drives in the pool. The low read rate to /dev/sdf indicates that the resilver is a blind write of blocks. (possibly relies on the drive hardware feedback for write success??)

More interesting is the reread and rewrite values from smartctl. These are lifetime counts and the rewrite value reported by /dev/sdf is growing during the resilver. In the few minutes it took to write this reply it has increased by 1200. This may be an indication of a controller, connector or the drive itself. Further investigation about this after the resilver is complete. But the rate of increase (about 2 rewrites per sec) does not feel like the root cause. But for sure it could be a contributor.

Similar results reported by zpool iostat

admin@lala120:~$ sudo zpool iostat -v samos
                                              capacity     operations     bandwidth
pool                                        alloc   free   read  write   read  write
------------------------------------------  -----  -----  -----  -----  -----  -----
samos                                       10.3T  4.29T    781    224  60.4M  15.1M
  raidz1-0                                  10.3T  4.29T    781    224  60.4M  15.1M
    7e88adf4-f27c-4f48-8a2c-9ed2ffd9ab50        -      -    201      7  15.3M   110K
    25075f9d-1fcc-427c-a6b7-efd1d942191e        -      -    202      7  15.0M   109K
    e14286ea-bd58-47b3-bd59-ac8cb1e12693        -      -    201      7  15.3M   110K
    replacing-3                                 -      -    176    203  14.9M  14.8M
      6714be3a-9cd4-4931-aa82-230863bde3eb      -      -    175      7  14.9M   109K
      119ebdcf-e50e-4ef7-a6d2-e6d8402d2093      -      -      0    195      0  14.7M
------------------------------------------  -----  -----  -----  -----  -----  -----

With the exception of the /dev/sdf rewrite performance these results seem within nominal range and does not explain the excessive resilver time of the last drive.

How do I look at latency, % busy and pending IO/s?

Captain_Morgan · December 31, 2024, 5:25pm

Its very high… I’d be surprised if you have 2 problems going on. Its usually only one.

The sdf rewrites will cause a pause in data transfers… which will impact bandwidth. If the disk writing is slowed down, ZFS will slow down the reads of the other disks to match the speeds.

If sda reread stats are not increasing, its unlikely to be that.

Half filling a 10TB drive at 10MB/s will take 500,000 seconds or 6 days…
If it were writing at 40MB/s it would be 4x faster.

tuxhobo · December 31, 2024, 10:10pm

Yes, the reread counts for all drives including sdf has not changed during this resilver sequence. It is only the sdf rewrite count that is increasing. Admittedly, I have not been tracking the time between measurement reads of sdf rewrite counts.My best guess now over 7 hours or so is about one rewrite every 2.25 seconds. I’ll see if I can improve the measurement accuracy of this rate. (my first estimate of 2/sec probably underestimated the time between measurements)

At this point we have a suspect: rewrite occurrences on sdf.

I have one more spare 10TB drive and will resilver it in a different SAS slot when this resilver completes. Then resilver sdf in a different SAS slot to see if the result is different for either drive. I want to eliminate the disk controller and cable as a factor in this trouble.
If sdf continues to be slow in a different slot then I will return it to the merchant as failed.

tuxhobo · January 3, 2025, 3:18pm

I replaced the 10TB disk that was initially suspect. The resilver took about 44 hours. This is longer than the resilver from each of the first three 4TB drive to a 10TB drive, but shorter than the 60 or so hours of the last (4 of 4) 4TB to 10TB drive.

I’m suspecting now that the part of the additional resilver time was because there were no 4TB drives remaining in the pool.

There is more strangeness with this pool now. After the last 4TB drive was replaced in the pool, I expected that the total size of the pool would change to reflect the larger capacity. It did not!

TrueNAS is reporting the drive capacity as 9.1TB but the pool used/available storage has not changed. The capacity still looks like the pool as it was with 4x4TB devices int he pool.

Stranger yet, I put one of the 4TB drives back and it is resilvering as a replacement of one of the 10TB drives. I did not think that that was possible!

tuxhobo · January 30, 2025, 4:20am

There is no solution to describe. The resilver operation just takes time to complete. I have no explanation about the expanding rewrite count. ZFS does not report any trouble with the drive. I have not investigated further now that the drives are operating in the pool. Maybe more later on this topic.

The problem with the pool expansion when larger drives are installed appears to be a problem with TrueNAS Scale: [Should a pool auto expand where it can?]

There is a Jira report referenced in this forum post about the trouble.

Using TrueNAS Core I’ve successfully expanded pools this way multiple times. This appears to be a continuing problem with Scale.

The real surprise was that with 4 x 10TB installed and while the pool did not expand, Scale permitted a replacement of all 10TB drives with 4TB drive. When multiple replacement drives are scheduled at once, the ZFS resilver operates sequentially over each drive. In my case it took nearly a week to complete the resilver.

In the end, I just created a new pool with 4x 10TB drives and used rsync to copy the data. The result is what I wanted but this solution needed four open drive bays and a couple of weeks of trial and error. I happened to have the open bays on this server, but upgrades could be troublesome on smaller systems.

gavin · May 15, 2025, 3:22am

Hi,

I had the same issue when I received some HGST drives. Mine are 10.0TB / 12 Gbps HGST (former NetApp drives)

Long story short, by default mine didn’t have ‘write cache’ enabled and would take a very long time and even the speed I was seeing from the disk were terrible.

The solution that I have is to have a script with the disk ids which enable the write cache on boot, since these disks can’t have the write cache enabled via their firmware as default this was the best option I had.

to find the disk ID

1 - put the disk in, i run dmesg to see what the sd(x) is. lets say sdg
2 - run ls -l /dev/disk/by-id/ | grep sdg
this will give you

root@nas1[~]# ls -l /dev/disk/by-id/ | grep sdg
lrwxrwxrwx 1 root root  9 May 15 13:02 scsi-35000cca26a088fa0 -> ../../sdg
lrwxrwxrwx 1 root root 10 May 15 13:02 scsi-35000cca26a088fa0-part1 -> ../../sdg1
lrwxrwxrwx 1 root root  9 May 15 13:02 wwn-0x5000cca26a088fa0 -> ../../sdg
lrwxrwxrwx 1 root root 10 May 15 13:02 wwn-0x5000cca26a088fa0-part1 -> ../../sdg1

3 - copy the scsi line, in this case scsi-35000cca26a088fa0
4 - create script i called mine enable_wce.sh and add the disk id into the DISK_IDS part. as you can see by my script I have 3 disks which will need the write cache enabled each time.

#!/bin/bash

LOGFILE="/var/log/wce.log"

# List of persistent disk IDs to enable write cache on
# to find the new ID run ls -l /dev/disk/by-id/ | grep sdx

DISK_IDS=(
  "/dev/disk/by-id/scsi-35000cca26b7ff1fc"
  "/dev/disk/by-id/scsi-35000cca26c1c6af8"
  "/dev/disk/by-id/scsi-35000cca26a088fa0"
)

echo "$(date): Starting WCE enable script" >> "$LOGFILE"

for DISK in "${DISK_IDS[@]}"; do
  if [ -e "$DISK" ]; then
    echo "$(date): Setting WCE on $DISK" >> "$LOGFILE"
    if sdparm --set=WCE "$DISK" >> "$LOGFILE" 2>&1; then
      echo "$(date): SUCCESS - Write cache enabled on $DISK" >> "$LOGFILE"
    else
      echo "$(date): ERROR - Failed to enable WCE on $DISK" >> "$LOGFILE"
    fi
  else
    echo "$(date): WARNING - Disk $DISK not found" >> "$LOGFILE"
  fi
done

5 - make it executable chmod +x enable_wce.sh

6 - run the script ./enable_wce.sh

To auto enable it upon reboot
add a Post Init script in the WebUI under system → advanced settings - init/shutdown scripts.

below is a drive I just got and am adding back into the pool.

  pool: pool0
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu May 15 13:02:53 2025
        6.31T / 16.0T scanned at 7.08G/s, 210G / 16.0T issued at 236M/s
        50.5G resilvered, 1.28% done, 19:29:54 to go
config:

        NAME                                        STATE     READ WRITE CKSUM
        pool0                                       DEGRADED     0     0     0
          raidz1-0                                  DEGRADED     0     0     0
            fc08a2af-b7b5-4d58-bb47-748196c50e91    ONLINE       0     0     0
            replacing-1                             DEGRADED     0     0     0
              2f382f97-0e88-4ffd-83a7-c544c86a8d7f  REMOVED      0     0     0
              91e585b7-fc7a-461f-a24d-ae96a462a8a3  ONLINE       0     0     0  (resilvering)
            881f18cb-5862-4f6c-9876-196d75e2a727    ONLINE       0     0     0
            67b9b9e0-00c3-43fe-a430-6d891d31520b    ONLINE       0     0     0
        logs
          255157f6-67da-4f64-95d9-2b883f314f54      ONLINE       0     0     0

errors: No known data errors

Hope this helps anyway else.