Hi at all.
Today one of my TrueNas Scale reported a problem with the RAIDZ1, which is in a “Degraded” state.
As visible from the screenshot I have 1 disk of the 4 installed, which is in the “Removed” state.
The disk is actually correctly inserted and is seen by the PRAID CP400i card.
I know I’m in a precarious situation, as if I were to lose another drive I would lose the data, but unfortunately this NAS is off-site, and it will take a few days before I can physically access it.
Is there anything I can do remotely to try to get the disk online again?
I apologize but I’ve only been using SCALE for a short time and it’s the first time this problem has happened to me.
Before anyone loses their mind here, this is an OEM (Fujitsu) SAS3008 based card.
@Daviderusso93 can you post the output of sudo sas3flash -list from a shell prompt? If you do not have SSH enabled (or cannot enable it) then you can find a web-shell at System → Shell.
admin@truenas[~]$ sudo sas3flash -list
[sudo] password for admin:
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.
No Avago SAS adapters found! Limited Command Set Available!
ERROR: Command Not allowed without an adapter!
ERROR: Couldn't Create Command -list
Exiting Program.
no, I haven’t tried pressing online and putting the disk back online. I preferred to ask as it is my first experience with a disk replace for ZFS.
Do you think it’s a good idea to try?
It’s not showing any previous errors or flagging the disk itself as degraded, so attempting to ONLINE through the button should either succeed, or throw an error about “device not present” at which we go back to SSH and ask lsblk what it thinks is present.
Assuming this is a hot-swap chassis because of your mention of “correctly inserted” vs “correctly connected” it would be unlikely to be a cabling issue.
When you mention that the drive is “seen by the PRAID CP400i” - where is the view of it being “seen” - in a boot-time configuration menu, or elsewhere?
I can see it on onboard iRMC Fujitsu interface on PRAID information:
DISK 2 - Failed Status
In TrueNas SCALE Interface, i see 1 unassigned Disk on Storage Dashboard.
I tried to click online on the storage dashboard, and I see that the disk is not available.
The disk is probably broken.
For the replacement I see that there is a “Replace” button on the disk info of the RAIDZ1.
Yeah if trying to online it fails go for a replace. After you’re golden again though, it may be worth investigating that your HBA is actually in IT mode because it is concerning that sas3flash ain’t picking it up.
Be cautious here. It seems that this controller is likely still in IR (Integrated RAID) mode and it’s possibly got them configured as “virtual disks” in its own BIOS. Until we can determine that, don’t do any “crossflashing” before we determine if the drives are passed through or not. Is there a “virtual drive” in the iRMC/PRAID at all, or are all four disks set up as “unconfigured/passthrough”?
If your iRMC/PRAID BIOS is telling you the disk is failed, it’s likely been failed at that level (of the IR mode of the card) and therefore the card isn’t passing it upstream to the OS.
If so, unconfigured disks in LSI IR controllers historically have been passed through without any “virtual disk” layers, so that’s a good thing and bodes well for the ability to just swap the disk physically.
Re: inability to post images, that’s a new-user measure. Let me see about fixing that for you right quick.
Can you post a picture from the iRMC/remote management console now?
If they are configured as four logical RAID0 disks then that may be the root of the problem. Does it show the physical disk as OK/ONLINE but the logical disk as FAILED?