Hello All!
I am running TrueNAS Scale and have been running it for about 2 years at this point. Approximately a year ago I finally got everything running as I wanted and I kind of walked away and dove into other projects that I wanted to work on(Classic Mistake for a tech geek with ADHD like me). Fast forward about 6 months and I had some sort of drive error which put my ssd pool into a degraded state but still running and working fine as it was only one drive. I ordered a replacement drive but when I went to replace it, started up the system and the drive that had supposedly failed was working with no issues again. Ran for a while, everything fine, shutdown and restart a few times and boom its acting like it is bad again. I replace the drive with the new one and I keep having issues with the pool randomly going into a degraded state. This is where I make my biggest mistake, somewhere along the way seeing as the pool was still functioning, I left the 5th drive (5 wide RAIDZ1) that seemed to be the issue out in order to cross my fingers and backup the data, but life happened and I never went back to add the drive back etc., and the system kept running without issue.
So here we are another 6 months later and we have an area wide power outage. TrueNAS fails to automatically come back up so I manually restart it, but this is when I notice I am showing another drive in DEGRADED state, my apps are not functioning etc. I restarted via the web GUI and it won’t come back. Yesterday a spend half the day trying to get it to boot with no success (I may have been using the wrong instance??), but then out of the blue it boots up and I am somewhat back to functioning as I was before the power went out. Most apps are up and running, my VM that boots on startup seems to be running, however some apps are not working and I still have the drive issue.
Basically, I realize I am a total jackass for doing this the way I did it an honestly I will be very surprised it I have not totally lost all my data on the ssd pool. I totally understand and make no excuses about that, but now I am trying to figure out how to best move forward, save anything if I can and get my server back to the state where I want it and can keep up with regular updates etc.
Obviously a 5 wide VDEV with only 3 functioning disks is totally lost and I know and understand that, however the only thing that is confusing me is the fact that this is the error message I am getting and what things look like in the storage dashboard…
Any suggestions, explanations, tips on how to best move forward?
TrueNAS Version: 22.12.3.1
Motherboard: Asus TUF Gaming X570-PLUS(WI-FI)
RAM Qty: 128g 4 stick
CPU Make/Model: 5950X
NIC: - Motherboard plus a couple of USB 2.5gb external NICs
Hard Drive(s) Make/Model:
5 Team Group EX2 2TB SSD
5 10TB spinning drives
2 112gb NVME Metadata VDEV for the spinners