Main pool not persisting after reboot, rebuilt multiple times

I’m running a TrueNAS Mini X+, running 25.10.2.1, with five WD Red SSDs (4TB) in a RAID1 configuration. Late last week, one of the drives reported errors, so I began the process of replacing it. For some reason, it would not replace, so (after making sure backups were in place) rebooted to ensure everything was fresh.

When the system came back up, additional disks were missing, destroying the pool. I removed it, and then re-created it, adding the disks back to the pool, and restored my data.

After the restore, I dismounted my backup to be safe. When I rebooted to be sure everything was working, all of the drives were missing from the pool - the pool still existed, but had no media in it. The drives were all present, but not attached to the pool, and I could not re-import them, despite the drives being listed as having the pool on them.

I destroyed the pool, re-added the drives again, and rebooted (before attempting a restore). This time, the drives persisted. I’m restoring my data again, but I am concerned that after this, the same issue will happen - the drives will mysteriously not be in the pool after the reboot, and I’ll be back at square one.

Any guidance on what may be causing this, and how to resolve it?

@joeschmuck Drive Troubleshooting Flowcharts

You can try working through this flowchart by another member. I don’t know how the drives connect to the motherboard on that model but checking for bad cables or reseating drive and power cables may help. What kind of errors were you showing? Posting the results from zpool status -v could help us see also.

You have missed reading Joes Rules. They are not much reading, a few minutes but will help you provide us the correct data we need to provide you excellent advice.

At this time, it sounds like an unstable computer, based on what little I have to go on. But if the drive are old, it could be that too. We just need some more data to help you properly.

Of course, now that I posted the problem has vanished, and I can’t re-create it.

I’ll give it a day or two, make a fresh backup of my data, and reboot again to see if the problem comes back.

Don’t you hate that. Like taking a sick child to the doctor and by the time the appointment starts, the child is feeling better. Why is that?

What is your boot drive and where is your System Dataset?
And then next, for the heck of it, rum MemTest86+ for at least 5 full passes. Just to ensure you don’t have something wrong with your RAM operation.

I work in IT for a day job. I get it all the time on help calls - “It was doing it until I called you, I swear!” The best was when I got such a call from a car mechanic… “Yeah, now you know how I feel when I bring my car in!”

Boot drive is the internal NVMe drive, System Dataset I think may have been the issue - after a few tries at this, I exported all external storage before recreating the main dataset, and then set it as the main one. Once that was done, everything persisted between reboots.

I got a good backup to my offsite drive last night, and I’m re-backing up to my onsite drive now. All data is intact, Apps are recreated, and my webserver VM came back without issue. What a way to know your backup plan works…

I’m having a minor issue with home shares, but I’ll make a new thread for that.