Hello there TrueNAS Community. I’m new with the use of TrueNAS and how its RAID system works.
I want to test the alerts of the system when a drive goes down and the RAID that has that drive gets DEGRADED. I noticed a feature that allows me to turn off a drive from the web UI. The theory says me that this should work to test the alert. However I have a doubt if I set the drive back online again, the RAID will re-build automatically, or if I will have to configure it again.
I know other RAID systems that they re-build as soon as they detect another compatible drive to make the RAID again, but I don’t know if ZFS RAIDs work the same or it’s something different.
I’ll be grateful if someone can clarify this to me. Thank you for reading!
As far as I understand it, you need to tell TrueNAS that you are replacing the disk, and in case there is already something on the new disk, tell it to discard that data. See Replacing Disks.
A tip: when I considered moving to TrueNAS, I’ve created a virtual machine with a few virtual disks to play with it without risking losing any data.
Hardware RAID and ZFS RAIDZ works a bit differently.
With hardware RAID, no matter what, as soon has a drive is removed and inserted back into the array, the entire disk is going to be rewritten, even if only a few MB of data is present in the array.
By comparison, ZFS will only focus on how much data is actually used on a disk. So if you have a few MB of data, only the few MB of data required to restore redundancy is written to the drive.
When ZFS sees a disk was removed and added back again, ZFS will first identify if the disk is the missing disk, and if so should start resilvering the pool. I think it might compare the metadata and figure out if there is a need to resilver or it might start resilvering but only recover the missing bits. It won’t touch valid blocks.
ZFS will tell you how much data has been resilvered (actively rewritten) in the process. It could be just a few blocks.
If you decided to replace the drive and selected to wipe the data on the new drive, then resilvering will take care of recreating the redundancy on the drive based on the amount of data in use.
In my case, the RAIDs are made of 6 drives. If I’m getting right your explanation, this means that if I turn offline for example the ‘sdf’ drive, the RAID that uses that drive will get DEGRADED, and when I turn online again the exact same drive, the RAID will resilver by itself and because of the RAID has 4 storage drives and 2 spares, I shouldn’t lose data, right?
Hardware RAID and ZFS RAIDZ works a bit differently.
With hardware RAID, no matter what, as soon has a drive is removed and inserted back into the array, the entire disk is going to be rewritten, even if only a few MB of data is present in the array.
By comparison, ZFS will only focus on how much data is actually used on a disk. So if you have a few MB of data, only the few MB of data required to restore redundancy is written to the drive.
When ZFS sees a disk was removed and added back again, ZFS will first identify if the disk is the missing disk, and if so should start resilvering the pool. I think it might compare the metadata and figure out if there is a need to resilver or it might start resilvering but only recover the missing bits. It won’t touch valid blocks.
ZFS will tell you how much data has been resilvered (actively rewritten) in the process. It could be just a few blocks.
If you decided to replace the drive and selected to wipe the data on the new drive, then resilvering will take care of recreating the redundancy on the drive based on the amount of data in use.
You should be able to evaluate the different scenarios based on the different .
If you offline “sdf”, “the RAID that uses that drive will get DEGRADED“ better worded as the pool with the missing “sdf” drive will become degraded.
If your pool is using RAIDZ2 with 6 drives wide, then you get 4 disk with 2 disk as redundancy, but it is not what I described the issue in your case. With RAIDZ2, you will be fine, but the issue is related to importing a drive into an faulted one.
So you need to read my post again and search and learn from the internet. You need to coroborate my findings against the noise.
There is a lot of subtleties/nuances in ZFS dialect you need to be made aware of.