Drive Replacement, not going offline showing faulted

Hello,

I 2 have a 8bay Dell R520 running TrueNas Scale 22 in an isolated network. It only handles my pictures. 01 is my primary server running raidz3 and 02 is my backup. Every night at midnight it makes a snapshot and then at 3am it replicates over to my backup server (02).

I found out on Monday that my pool was unhealthy and a drive was showing faulted. I saw about 21 read errors at that point and went a head and ordered a new drive.

Last night I tried to put the drive in offline mode and its just spun please wait until the screen refreshed and it still shows faulted. Now the the please wait is faster. I read on an old forum post that faulted drives are already offline so I can just remove the drive and put a new drive in and do the replacement which also didn’t work. It did show the new drive in the host but wouldnt let me replace sdg(bad drive) with sdj(new). I put the old drive back in but now it shows as sdk instead of sdg. I still cant get it to go offline.

I’m reluctant to power off the host as I know there will be data loss. When I ran my manual snapshot and tried to replicate it over to the backup server. The job didnt seem to accept the manual snapshot.

What can I do to replace this drive?

RAIDZ3 has triple redundancy, so a single bad drive should not lose your data. So I am not sure why “[you are] reluctant to power off the host as [you] know there will be data loss” - this shouldn’t happen, but if it does there is likely some other issue that needs resolving.

My advice - and I may be wrong so take a 2nd opinion - would be:

  • Reboot your server. Hopefully the pool will remain online and you can then simply resilver to the new drive by doing a ZFS replace.

  • If for some reason after the reboot the pool doesn’t come online, we can help further.

1 Like