ZFS Pool Offline due to MegaRAID Protection

Hello. I’m sad this is my first post here but, we’ve had an unfortunate failure with our relatively new TrueNAS-JBOD installation which I’d appreciate some assistance with.

Hardware: Ultrastar Data60 JBOD connected via SAS to Broadcom MegaRAID 9580-8i8e of Server Running TrueNAS
Offline Pool: 2 × DRAID2 | 16 × 18.19 TiB

During some maintenance work the JBOD was powered off and back on again without the TrueNAS server being shutdown. I’ve investigated and it appears that the MegaRAID has seen the severed connections and taken action, within the MegaRAID Configuration on the TrueNAS Server it has switched the Personality Mode from JBOD to RAID and all the disks are showing as Unconfigured Bad.

I’m too new to post images so I’ll summarize them.

In the MegaRAID Configuration it says:
Controller Status Needs Attention
Personality Mode RAID
Drive Count 35
JBOD Count 0

The disks can all be seen but they say Unconfigured Bad.

I’m hoping this is not a data loss scenario, but do have several options available - I’d appreciate a pointer on the correct one.

  1. Switch to JBOD Mode option

This warns:
All existing Configurations on the controller must be deleted before proceeding.

  1. Force Switch to JBOD Mode

This says:
option will allow switching of the Personality Mode when a configuration is present. This should only be done as part of the controller replacement.
This operation will results in the following: 1. Enumeration of virtual drives will change. 2. All Virtual Drives will become foreign. 3. In JBOD mode virtual drives in RAID5/6/50/60 cannot be imported.

I’m also unsure whether I need to make the drives to Unconfigured Good before attempting one of those.

Thank you!
Stu

This is not good, and it’s not sure you can turn it into a genuine HBA.

Best replace the 9580 with a 9300-8e, or 9305-8i8e if you need the same port configuration.

Thanks for taking the time to help.

It was performing well before this incident. When this was setup initially it was necessary to set the drives to unconfigured good then set JBOD. The autopersonaility feature is set to NONE so I have no idea why it has also flipped to RAID personality but I would expect the drives to be in Unconfigured Bad as it would have seen the drives forced offline.

Is setting the drives to Unconfigured Good and then force switching to JBOD mode going to cause issues with the presentation to ZFS because the virtual drive enumeration metadata changes?

Quite possibly.

The point is that RAID controllers work with ZFS… until they don’t. And then it may be too late to salvage anything.

1 Like

It is a well understood recommendation that RAID controllers should be flashed with IT firmware to completely disable RAID functionality because if you don’t then these types of issue can occur.

I would personally say that whilst they may previously have been performing well, now they are performing extremely badly. Performance is not the most important measure of configuration quality - I would personally advocate that security against loss is the most important factor.

  1. ZFS transactional functionality is what should have ensured that powering off the external enclosure didn’t result in any data loss on the disks. Using a RAID card with RAID firmware configured in JBOD mode is NOT the same as the same card with IT firmware because A) ZFS needs to see the raw disk configuration and serial numbers and the RAID firmware can interfere with this even in JBOD mode; and B) the RAID firmware can resequence I/Os and break the transactional integrity that ZFS tries to ensure.

  2. IMO your best bet (if you haven’t already done so) is to reconfigure the RAID card back to JBODs and hope that the pool can be imported, possibly requiring a rollback to an earlier transaction.

  3. IMO you should NOT attempt to reflash to IT firmware whilst you still have data you hope to access on the disks. It is quite possible that doing this may result in ZFS being completely unable to import the pool.

So, the steps I think you need to take are as follows:

  1. Decide whether to try to bring this pool back online as-is or to recover from backup. Only you can determine the impact of the loss of any recent data.

  2. If you want to attempt recover this data, then you should:

    • Start to provision new storage on a new correctly configured HBA where you can move data from this pool if it is able to be imported.
    • NOT do anything that might make things worse. Do NOT try running complex zpool import commands with -f or -F flags in the hope that they might fix things - they could easily make things worse. Get good advice from people with knowledge. If you can afford it, you might want to engage iX support on a paid basis to help you. But whoever gives you advice, you need to start providing them with full details of your environment so that advice specific rather than general.
  3. If you decide to restore from backups, then you should probably:

    • Flash your card to IT mode
    • Clean the disks to remove signs that they were previously part of a ZFS pool
    • Recreate the pool
    • Restore the data
2 Likes

OK, yes certainly lesson learned, and I’m fortunate to not be dealing with as much data as there could be later down the line.

Thanks for that detailed advice @Protopia, much appreciated.

I haven’t tried to switch the modes yet, so I think we’ll purchase a new card and disks as we were planning to populate the JBOD more anyway. We’ll create a new pool to restore the data to(in the hope we can access it). We don’t actually have the need for any internal connectors so an 8e would be fine. I assume the 9300-8e is recommended as its been road tested for so long but its quite hard to get hold of for me because we have to use specific suppliers. Would a 2nd IT Flashed 9580-8i8e be ok? I’d get a 9580-8e but I don’t see these readily available.

I guess no. You really want a basic HBA and avoid anything that reads “MegaRAID”. A 9300, 9400, or 9500 if you have to (but the earlier generations are more trusted) but nothing with “high last two digits”.