TrueNas Scale with RAIDZ1 "Degraded" and one disk "Removed"

HoneyBadger · November 28, 2024, 4:18pm

Physical drive is definitely toast, as shown in picture #1.

Picture #2 shows that you have four drives in four logical RAID0s which is also not ideal.

If you install the drive into slot 2, and don’t configure a Logical Drive, check to see if it is visible in your TrueNAS installation. If so you can use the REPLACE option - and you may have a path to migrate drives one-at-a-time to “passthrough/unconfigured” mode, and then flash to IT mode.

Follow-up question: If you look at the Details of your controller on that page or others, does it give you any information about “write cache” or “cache behavior”? Wanting to ensure you are not only using safe write behavior but also that you’re getting maximum performance.

Daviderusso93 · November 28, 2024, 4:28pm

In the controller details i can see only this information:

2024-11-28-17-19-57-Creativespace-PC-HP-Any-Desk

I also noticed that if I look at the “Disk” section from the TrueNas webui there are:

sda (Pool HDD)
sdb (Pool HDD)
sdd (Pool HDD)
sde (SSD with boot pool)
sdf (Pool HDD)

The failed disk was originally “sdc” (Right?)
In the dashboard I see that I have an “Unassigned Disk”.
Is it possible that you are actually seeing the disk?

Daviderusso93 · November 28, 2024, 4:30pm

If i try to replace the “Removed Disk” i see “SDF”.
Can i proceed?

HoneyBadger · November 28, 2024, 4:33pm

Definitely IR mode here.

What happened is that the previous RAID0 logical disk presented through the IR mode on your card had an old serial number assigned to it by the RAID card. This S/N and UUID mapped it to sdc. When the physical disk was reinserted and/or reinstalled, the logical disk is now “dead” because it was RAID0 (no redundancy) - but the underlying physical disk is now getting passed through to TrueNAS, which is seeing its genuine S/N and has probably given it a new UUID - which is mapping to sdf.

Replacing the missing sdc with the new sdf should work; however, if you do this, you don’t want to “initialize” or otherwise define a new logical disk in the RAID BIOS as that will wipe it again.

As you’ve only used 1.6T or so on this filesystem I’d like to ask - how challenging would it be to back up all data here and rebuild again?

Edit: I’ve rehosted your images locally. Can you please upload them directly to the forum in any future posts for auto-resizing and easier access by all? You can drag-and-drop them right into the compose window.

Daviderusso93 · November 28, 2024, 4:37pm

I’ve read your note about the images, okay, I hadn’t found the right button, but now I know where it is

I can try, but I will definitely need a few days because as I was saying the physical hardware is located in a separate office far from my location.
At the moment can I try to replace the disk by mounting sdf? In order to have the redundancy necessary to give me a few days to reach the headquarters.

How do you recommend rebuilding the TrueNas to make optimal use of the system?

Thank you very much for the information, this is very important for us users with little experience on this system.

HoneyBadger · November 28, 2024, 4:47pm

At the moment, using the REPLACE option to replace sdc with sdf should work, to regain your redundancy.

My recommendation would be to back up all of your data, flash the card into IT mode (or replace it with one that is already in that mode) and then rebuild your pool and restore the data.

You may be able to do it one disk at a time by the same process you have done here (hotplugging a drive, and then replaceing the failed disk with the new passthrough-mode one) but that will put data at risk every time unless you have a spare disk to use as the “pivot mode.”

Back up all data.
Did you back up all of your data?
Flash the CP400i to IT (Initiator/Target) mode as described by the post SmallBarky links to (TrueNas Scale with RAIDZ1 "Degraded" and one disk "Removed" - #9 by SmallBarky)
Boot TrueNAS again - see if it detects your pool (unlikely)
If your pool is missing and/or your disks show as unused, make the pool again, and restore the data.

Daviderusso93 · November 28, 2024, 4:55pm

Ok thanks for your precise information.

I tried adding SDF but it can’t seem to find the disk.
I need to do a wipe of SDF disk before Replace the disk?

At this point I will try to proceed as suggested and rebuild the Raid with the PRAID card in IT mode but i need more time…

I ask you one last question, can I also try the option of adding the disk to the existing pool from the “Storage Dashboard” without carrying out the “Replace” (which doesn’t seem to work)?

HoneyBadger · November 28, 2024, 5:04pm

I would be hesitant to do a wipe - the additional layer of abstraction of the RAID card here is posing a challenge to TrueNAS being able to identify the disk correctly, so I would be concerned that disks have “shuffled” in a manner of speaking, and the loss of a second disk through a mis-targeted wipe would put the entire system offline.

No - this flow is designed for adding new disks as additional storage to the pool, not replacing an existing failed disk.

What does the output of lsblk show from the command-line? Using the </> button in the composition window for “Preformatted text” or placing the text between triple-backticks

```
text
```

makes it easily readable.

neofusion · November 28, 2024, 9:47pm

Just putting it out there, but your controller is running rather hot at 80 degrees C, perhaps your drives are also toasty?

Daviderusso93 · November 29, 2024, 7:38am

I apologize for the delay, but i was out of the office and just got back.
Here is the output:

2024-11-29 08_37_23-Creativespace PC HP - AnyDesk

SDF actually has no size, so in my opinion it is definitively broken and needs to be replaced

Daviderusso93 · November 29, 2024, 7:45am

I have other servers like this, and their controller are all between 60 and 70 degrees Celsius, this is the only one that reaches 80.

However, i have already seen operating temperatures like this in other machines and they have not created too many problems. The environment where the servers are located is air conditioned and the machines are cleaned of dust regularly.

Maybe I could add another ventilation, but it would be something about self built, and I don’t know how functional it would be

Protopia · November 29, 2024, 10:21am

If the sdf disk is a new replacement for the one that failed, then I guess you can go ahead and do a replace, however if it is the same disk that failed that is represented natively now, then before you replace you should probably:

Do some basic functionality tests on it to make sure it is working and can be read and written to.
Do a SMART long test on it
Check out the SMART attributes
Perhaps do a stress test.

neofusion · December 1, 2024, 3:04pm

While I led with the controller temp my main concern was actually the temp of your drives.