Sadly only your memory could confirm. I swear from what I remember from TrueNAS core disks 100% showed individually.
Yes, we’d just undo the passthrough in the settings of the HBA, but at this point I don’t want to comment on if this’d make things worse if it was ALWAYS been presented as single disk to TrueNAS. Am currently not going to comment on recommending this further - this territory feels like it is outside of what I’d normally be comfortable suggesting if your pool always existed as a single drive to TrueNAS.
If we do have ZFS mixed with a RAID controller I’m not comfortable in confirming anything. To me, there is nothing inherently risky about exporting a pool (as long as you don’t set the delete data option in the command) & the re-importing, however I’m no longer comfortable advising on next steps & differ to those smarter than me to give you advice.
Edit: HoneyBadger & Stux coming to the rescue - I differ to them entirely.
zpool import with no parameters will tell you that a pool can be imported - ONLINE in the status means it’s theoretically present, but as others have indicated you have a hardware RAID5 with 6x8T drives.
The first step I would take is to restart the system and try to enter the H710p BIOS - check there to see if it offers any information on the single 40T “virtual drive” being online and healthy, as well as the backing physical disks.
Unfortunately with this configuration, ZFS can’t provide any redundancy across the six disks - it’s only able to see a single drive.
Converting the controller to passthrough mode will result in the data definitely being unavailable.
The command that @Stux posted will also work - if it does permit the pool to be mounted (and it is visible in a zpool status -v) then that’s good - but I recommend cleanly shutting the system down and checking the BIOS regardless to see if there are any iDRAC warnings (a bad battery on your H710 perhaps)
There is something in the documents about the following. Maybe check its status too while in BIOS?
Controller Cache Preservation
The controller is capable of preserving its cache in the event of a system power outage or improper system shutdown.
The PERC H710, H710P, and H810 controllers are attached to a Battery Backup Unit (BBU) that provides backup power
during system power loss to preserve the controller's cache data.
Cache Preservation With Non-Volatile Cache (NVC)
In essence, the NVC module allows controller cache data to be stored indefinitely. If the controller has data in the cache
memory during a power outage or improper system shutdown, a small amount of power from the battery is used to
transfer cache data to a non-volatile flash storage where it remains until power is restored and the system is booted.
Recovering Cache Data
The dirty cache LED that is located on the H710 and H810 cards can be used to determine if cache data is being
preserved.
If a system power loss or improper system shutdown has occurred:
1. Restore the system power.
2. Boot the system.
3. To enter the BIOS Configuration Utility, select Managed Preserved Cache in the controller menu.
If there are no virtual disks listed, all preserved cache data has been written to disk successfully.
The command that @Stux posted will also work - if it does permit the pool to be mounted (and it is visible in a zpool status -v) then that’s good - but I recommend cleanly shutting the system down and checking the BIOS regardless to see if there are any iDRAC warnings (a bad battery on your H710 perhaps)
I’ve never accessed the iDRAC before, I’ll be in there in the morning but might need to set that up to gain access (?) or can I access all this through the CLI directly on the server too?
Should I try zpool import Photoshoots -R /mnt now or wait until I have checked the BIOS first?
Nothing to do with the RAID card there unfortunately. You may need to cable the iDRAC port into your network and let it pick up a DHCP lease.
Give the zpool import Photoshoots -R /mnt a try and see if it complains - if it tells you that you have corrupted data, you may need to look at rewinding.
Nope, that’s what you don’t want to see unfortunately. You may have to check in the BIOS/UEFI to get into the PERC RAID card setup screen and have a look at what it thinks the state of your virtual and physical disks are.
Fault after power loss makes me think that perhaps the battery backup or the non-volatile cache on your controller are bad.
Morning. After a relatively sleepless night I’m heading in to the office now.
I’ll try to access the BIOS via reboot as per below but I’m not 100% sure what I’m looking for…
check there to see if it offers any information on the single 40T “virtual drive” being online and healthy, as well as the backing physical disks.
and …
You may have to check in the BIOS/UEFI to get into the PERC RAID card setup screen and have a look at what it thinks the state of your virtual and physical disks are.
I’ll check through the steps in SmallBarky’s post above too RE: The
Controller Cache Preservation then update.
Considering getting in touch with someone today that can help walk me through it but I did try a few businesses yesterday with not much luck. If any of you are available for a call today that would be a big help, I’ve a budget to cover time.
If not, could you recommend a UK based company that have expertise in this, was looking at these (Contact | Haptic Networks) will ikely call this morning…
Thanks again for the help. It’s a massive support.
I now have access to the iDRAC thanks to the efforts of the tech engineer that came out this morning and the really helpful support tech from bargainhardware.co.uk where we initially bought the hardware from.
Unfortunately, there isn’t anything showing in the iDRAC logs about errors or faults. The discs are all showing as status ‘unknown’ see below.
The controller is showing that it’s set-up in RAID-6, which tallies up with the number and size of the drives vs what is showing as the size of the disc in TrueNAS too.
The support guy couldn’t really take this forward as there apears to be no error on his end. I’ve ordered a replacement controller and drive in case we find either have failed but speaking to the support guy he seems to think if I have to replace or reconfigure the controller then I’ll lose all the data. Could someone confirm?
Is there anything that I can investigate from the iDRAC or from BIOS (I’m in building with the server) to find out anything else that could help diagnose this?
I’ve spoken with about 5 companies, had people out to the building but haven’t moved much further forward.
Man I don’t want to sound like a jerk, but whoever originally set you up with a RAID card NOT flashed to IT mode, and also purposefully set to RAID-6 & then decided to feed it to ZFS screwed you. HARD. They have a fatal misunderstanding of how to set up ZFS on the most basic level.
I have no clue if it is possible now to recover anything because of this - which is why I said earlier that I have no advice I could give that wouldn’t risk everything & that I’m not comfortable advising further.
I don’t know if you have any chance to recover your data - if you do, back it up off of that setup asap. On the next run, please, use an HBA flashed to IT mode or just connect the drives to the motherboard & let ZFS handle making of raidz1/2/3/mirrors. Fully. It needs complete access to the drives.
SmallBarky gave you IX’s number, I’d say they are your only slim hope, but even then, the way this was originally setup was fatal.
Your individual drives are showing Online at least, but the status being missing and tagged as <?> for the controller itself is concerning. At the least, it’s showing the battery as Ready/Online as well.
Let’s get back into TrueNAS and try zpool import Photoshoots -fFn -R /mnt to simulate a “rollback import” - if it doesn’t throw the same integrity-check error, that means it might succeed here.
If that takes, try a zpool status -v Photoshoots and post the results, then browse into the /mnt/Photoshoots directory and try to ls some contents. Find the most recent files or folders, see if the contents are readable. You may not be able to mount them as a network share but you can copy them off over SCP if you have SSH access.
Loss of that ten seconds of data is probably a given at this point.
It looks like you might have a bad NV cache module on the controller - however, I would be exceedingly cautious when performing the swap. Carefully review the Dell manual and processes for disabling and fully flushing the cache (which will also hurt performance on the virtual disk) before fully exporting the pool and shutting the system down.
Unfortunately there is no way to convert the system to individual disks/passthrough or “IT/Initiator-Target” mode while keeping the data intact, because it’s a single virtual disk. You’d need to be able to hold the entirety of that ~40TB volume on an external set of disk(s) - ideally with redundancy - in order to be able to swap out the controller or reflash it to IT mode, and then copy it all back.