Shutdown and Pull the drive

I used idrac to reboot the truenas server. It came on, it gave the notification thats its resilvering. I was following the logs then i saw that it was shutting down

I am guessing you don’t have a current system configuration backup at this time.

Did you ever figure out what and where the boot disks were and what one was considered ā€˜good’ and ā€˜bad’?

Do you have a boot devices menu to access to see if you can try to select the other boot disk in the mirror? Guessing it may be booting from the failing and the system doesn’t try the good one.

If you get it booted, I would get a system configuration backup. We can do a fresh install of the same TrueNAS version on the boot drives and then reload the configuration file, if necessary.

i have the current system configuration file. I was going to work on the boot disk issue this weekend. When it comes up, i’ll try and down load the config again. But i drac isn’t giving me the option to select or i don’t see it. Am not to familiar with idrac

I was able to download the current config and i am restarting it

When you say point do you just selecting a new boot device, like dual booting windows and linux?

Yes, the systems don’t automatically fall over to the new boot device when mirrored. If it is pointed to boot1 in the bios, you may have to point it to boot2, like when you have to boot a windows pc from a USB stick or CD. You might have to change the boot order of the devices

damn. okay. since i have the current config file, i can do a fresh install of truenas, boot it and then restore from backup. Is there any issue i might fall into?

I would prefer getting the system to boot from one of the current boot devices. The reinstall was a backup or last resort if both boot drives died in that pool. It saves you having to set up everything like networking, shares, etc.

woah. shouldn’t the config file contain all that information?

The configuration file is supposed to have all the info. I am not familiar with hardware and your HBA looked to be in ā€˜IR Mode’. We didn’t get serial number info, etc. when you ran the command in post #3.

Let’s see if I can get an experienced second opinion. @HoneyBadger Asking for help or second opinions. Raid-Z3 with a faulted disk and boot-pool with a faulted disk. HBA appears to be in ā€˜IR Mode’ since I didn’t get serial # in post 3 commands.


i only see 1 usb device

You might be using a regular HD and USB. mirrored together for the boot-pool. It could even be a USB SSD or hard drive. Some people use those instead of a USB thumb drive.

In post 10 you asked a question about da13 and it being a USB device. What did that info look like for da12, the other drive in the boot-pool?

i am not sure i am following on this part
What did that info look like for da12, the other drive in the boot-pool?"

Have you identified what two devices that the boot pool uses? One looks to be the USB. What is the other one? Are there 2 USB thumb drives for booting?

I also don’t know why system was shutting down while resilvering.

I don’t know how we would attempt a clean install if you can’t change the boot device order either. We have to boot from a TrueNAS install usb to attempt it.

Take pictures if it helps.
We badly need a hardware description.

        NAME                                            STATE     READ WRITE CKSUM
        Tank1                                           DEGRADED     0     0     0
          raidz3-0                                      DEGRADED    65     0     0
            gptid/a980e29d-3d83-11ec-8aeb-246e962dd6b0  ONLINE      66     0     0
            gptid/aa322e75-3d83-11ec-8aeb-246e962dd6b0  ONLINE      66     0     0
            gptid/7a7cb10b-6720-11ec-9fc6-246e962dd6b0  ONLINE      66     0     0
            gptid/ab23c2bf-3d83-11ec-8aeb-246e962dd6b0  ONLINE      66     0     0
            gptid/aba1bae8-3d83-11ec-8aeb-246e962dd6b0  FAULTED    198     0     0  too many errors
            gptid/ad2f9f83-3d83-11ec-8aeb-246e962dd6b0  ONLINE      66     0     0
            gptid/acf3256f-3d83-11ec-8aeb-246e962dd6b0  ONLINE      66     0     0
            gptid/ae202d16-3d83-11ec-8aeb-246e962dd6b0  ONLINE      66     0     0
            gptid/adcee7d1-3d83-11ec-8aeb-246e962dd6b0  ONLINE      67     0     0
            gptid/ad9e9258-3d83-11ec-8aeb-246e962dd6b0  ONLINE      67     0     0
            gptid/addb8a6e-3d83-11ec-8aeb-246e962dd6b0  ONLINE       0     0     0
            gptid/fcf1f4f7-68dd-11f0-a3d1-246e962dd6b0  ONLINE       0     0     0
        cache
          gptid/ae4aff35-3d83-11ec-8aeb-246e962dd6b0    ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: resilvered 3.68G in 00:20:41 with 0 errors on Thu Jul 24 18:46:55 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da13p2  ONLINE       0     0     0
            da12p2  ONLINE       0     0     0

@etorix
The server is a PowerEdge r720xd

It looks like you fixed the boot-pool. It’s not degraded.

Your pool Tank. I don’t see gptid/ae4b430b-3d83-11ec-8aeb-246e962dd6b0 FAULTED as it was above. Did you replace a disk? You are now showing a lot of read errors and one FAULTED disk.

@NickF1227 Seeing if you have any input on this thread and latest post.

Thats what driving me nuts. I replaced the faulted disk for gptid/ae4b430b-3d83-11ec-8aeb-246e962dd6b0 and now its showing a lot of read errors.

The drive showing 198 Read errors. Did that take the physical place of the other? The pattern looks like cabling or backplane, just guessing. Are the fans working? We have seen controllers getting hot and causing errors before too.

Do you know if this PowerEdge r720xd is stock? I was thinking we might be able to use the Dell service tag to get a list of equipment it shipped with.

Otherwise please try to run the following. You might need to put ā€˜sudo’ before command. Please post back with the preformatted text. Use a box per command. I don’t expect you to get results for everything but it may sort out what hardware you have and the controller firmware, etc.

lspci

lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME

sudo sas2flash -list
sudo sas3flash -list
sudo storcli show all
1 Like

I’ll get these details today. I have another truenas server that was working fine and all of a sudden 6 disks are showing as DEGRADED.