Zpools randomly went offline after CPU upgrade

Hi! I swapped my Epyc CPU with an inferior one, so I could move the better one to an app server.

While I was screwing in the new processor, the NAS turned on. I quickly turned it off, but now when I turn it on, even after a restart, it’s saying the two data zpools are gone. The only pools showing up are the boot-pool and TrueNAS apps pool:
image

My processor is showing up correctly:
image

My drives are all here and correctly showing which pool they belong:
image
image

Actually, that’s not true. There should be two more NVMe drives. I’m gonna look into that. Either way, those are for the Bunnies pool, not the Wolves pool. I’m wondering why these two pools suddenly went offline and won’t come back online.

Side note: I did remove two 2TB drives from the NAS as well. I’m gonna re-add those. I wonder if they were somehow in use for the Wolves pool. I was pretty sure that section of the NAS had drives that weren’t in use, but it’s possible I messed up and just happened to remove the two drives that were part of Wolves.

Does TrueNAS just OFFLINE zpools if something’s missing?

Looking at my drive chart, it appears as though two drives in this section are part of the Wolves pool:

Not sure those are the ones I removed, but I’ll move them back if that’s the case. Even if I fix this issue tonight, I wanna leave this topic here in case anyone else notices the same thing.

I believe this will happen if something goes missing WHILE the server is turned off.

Ie, it protects you from mounting a pool when you forgot to re-install a disk.

Reinstall the missing disks. You can use “zpool import” in the shell to see what’s missing.

Then try again…

OR force the mount.

1 Like

I readded those Wolves 2TB drives and made sure all the NVMe SAS-style connectors are snug in the drives. Those connectors are very prone to coming out because you looked at them wrong.

After restarting TrueNAS, Wolves is fixed, but Bunnies still has issues:
image

I’ll see if I can reseat those NVMe cables one more time.

zpool import should tell you which vdev, and which disk is missing…

might only be able to tell you the id of the disk, but should give you a hint at least where to look

Thanks @Stux!

Yeah, those two NVMe drives are the remaining ones I need to figure out:

Once I get them showing up, are you saying I don’t have to restart, there’s a way to online the pool? After plugging in those drives for Wolves, it didn’t online the pool until I restarted TrueNAS.

Well, yeah, once you get them to show up (actually just one of them), you should be able to import the pool in the GUI.

but I would just restart personally.

The trick to mounting a pool in the CLI, as if it was imported via GUI, is to import it with -R /mnt iirc

1 Like

Thanks! It seems those cables might’ve not been the issue.

After changing out the GPU, some UEFI options go reset. SuperMicro…

I had to change JNVMe1/SATA to NVME (not AUTO) to fix the issue of those two drives not showing up:

This system is 100% full in every slot, so it’s sensitive to changes like these.

1 Like

They’re back!

It was a combination of 2 issues:

  1. For Bunnies, Changing out the CPU changes my NVMe settings for my SlimSAS ports, so that made two NVMe drives become unavailable (loading in SATA mode).
  2. For Wolves, I had taken out the wrong 2TB drives. The ones I pulled were used for metadata.
1 Like

:face_with_monocle:

If I can make a suggestion, for future hardware maintenance operations, please unplug your system entirely from the power source before changing non-hot-swap components like memory or processors. Especially in systems that have a BMC (baseboard management controller) there can still be energized components, or it may be reading information to inventory them.

3 Likes

Yeah, I noticed that after swapping some RAM. I’m used to turning it off from the PSU, but these server PSUs aren’t switched. My bad.