Just upgraded 3 disks - is one of them faulty?

Hello! I’ve been using FreeNAS and TrueNAS for a couple of years now and decided I’d replace my existing disks with higher capacity ones, resilver after each change and finally extend my existing pool. My NAS has room for 4 disks. I started with a single disk a month ago and got 3 more this week to finish the process

Regarding the 3 new disks, the first two replacements/resilvers went fine. However, I noticed the third resilver was taking much longer than the previous ones. I made sure there were no scheduled S.M.A.R.T. tests or scrubs running. Looking at the disk activity graph in the web interface shows something concerning:

You can clearly see two peaks (they’re around 100 MiB/s) which correspond to the first two resilvers (which completed in around 4 hours each) and then a smaller “flat” line during the third resilver (around 20MiB/s), indicating it was much slower and took longer.

Would this be enough to conclude that disk is faulty and has limited write speed?

I also ran a write speed test with dd following the tips in this archived post (disable cache and compression for the dataset during the test).

The three new disks performed at around 80-100MiB/s for the duration of the test.

I am currently also running a read speed test and will post their results.

I’m confused and concerned as to why that last resilver took so long.

Can anyone offer any suggestions or ideas?

What are the models numbers of your disks? (Should be able to get it from disk list)

This is why it is a good idea to burn-in drives prior to deployment…

1 Like

This is enough to suspect you might have a SMR drive in the mix.

Indeed. The time for it to fail or behave strangely is before it’s part of your pool.

1 Like

The first I replaced is a Toshiba N300, reported as TOSHIBA_HDWG440 by TrueNAS.
The other 3 are Seagate Ironwolf drives, reported as ST4000VN006-3CW104 by TrueNAS.

They should all be CMR as far as I’m aware.

I definitely learned my lesson regarding drive burn-in :upside_down_face:

Still, is there any way to identify which drive could’ve caused the slow resilver (or if a drive even is the cause to begin with)?

I still haven’t expanded the Pool so I could offline each new disk, replace it with its previous counterpart and test it individually. Would this be the way to go?

Reply to Fleshmauler:

Exactly! Actually I do that by applying a full veracrypt encryption of the new harddrive. The drive is inserted in an actively cooled USB case. That way I don’t need to mess with my hardware and can still use the PC.

I just check the smart data before and after the full write. :face_with_peeking_eye:

Check the SMART values of the 3 new disks. Anything suspicious there?

Nothing of note in S.M.A.R.T.

No errors logged. The raw error (read, seek, ECC recovered) rates for all 3 new disks convert to 0 using this neat calculator someone made.

I have offlined the last disk I changed (the one that resilvered slowly, let’s call it disk ‘B’) and replaced it with the disk previously installed on the slot (disk ‘A’). The resilvering of disk ‘A’ is running at the expected speed (~100MiB/s). I am also now running badblocks on disk ‘B’.

Curiously, TrueNAS originally gave me the following error when trying to replace disk ‘B’ with disk ‘A’:

Invalid argument during seek for write on /dev/sdX

The error cleared up with a reboot.

Maybe a bug in TrueNAS just doesn’t like doing too many resilvers in sequence? If badblocks and subsequent S.M.A.R.T. tests don’t reveal anything wrong with disk ‘B’ I might just try replacing it after rebooting and see if it performs better.