Hello,
We have a Truenas Core system with 1PB of storage, which has reached 100% capacity don’t ask me why they filled it up. At the same time, one drive failed and was removed. The system has 10 vdevs all configured in RAIDZ2.
No matter what we try, the system fails to boot trying to mount the pool and consistently displays the following message:
Warning: pool ‘Big_pool’ has encountered an uncorrectable I/O failure and has been suspended.
At one point, I was able to mount the pool as read-only, but afterward, I was no longer able to do so. Every command I try to run causes the system to hang for a while. I can see the drive activity (LEDs blinking), but eventually, it returns to the same error message.
If I boot the server without the drives, the system starts up just fine, so it doesn’t appear to be a server hardware issue.
When I plug in one JBOD hot after the system has booted, it successfully scans and detects all drives. Then, if I plug in the other JBOD, it also detects all drives correctly. I am able to export the pool, but cannot import it.
The server has plenty of RAM, and the boot drive is healthy.
Is there anything I can do to mount the pool (even temporarily) in order to recover some files and ree up space? Unfortunately, this is our only backup.
A complete, detailed listing of the hardware and OS version may be helpful for others to reply.
Its an Intel server Xeon 32 core, 256Gb RAM, 9400 SAS card 2 x Jbod with 60 x 14Tb drives. Truenas core v13
I see you’ve decided that it’s not the hardware, and doubled down by not saying practically anything at all about the specific hardware in use anyway…
I can only conclude that you are happy solving this possible data loss disaster yourself or failing that, from your presumed 1 PB primary storage (you’re lucky only your backup was affected!).
I have two controllers, and I’ve already swapped both storage units. Both controllers are able to see the drives, which tells me this isn’t a SAS issue.
I also booted the system without the drives, and it boots up fine, so it’s not a boot drive issue either.
I was thinking of running zpool clear
to lift the suspension. The problem is that the drives were filled to 100%, and now the system is so unresponsive that I can’t run many commands even zpool status
just hangs the system.
We really need all the details on the system to give good advice. We can only go off what you provide in this thread. We are not there to see what your entire hardware setup is nor how you have TrueNAS Core set up.
The easiest is to rebuild the entire pool and vdevs and restore the data from elsewhere. You may need to add more storage before restoring. “At 90% capacity, ZFS switches from performance- to space-based optimization, which has massive performance implications. For maximum write performance and to prevent problems with drive replacement, add more capacity before a pool reaches 80%.”, from ZFS Primer article. If you are using Block storage, 50% or lower is recommended.