Best practiced for an external JBOD attached to TrueNAS?

Hello Everyone,

Running TrueNAS Scale 24.04.0

I currently have a 36 drive enclosure which is full. It is configured with the following:
LSI 9300-8i to the internal backplanes.
and a LSI-9300-8e linked to a 24 bay 2.5" supermicro jbod enclosure. Both are running in IT mode.

External JBOD zpool is configured as a single pool consisting of a 12 x MIRROR 2 wide using 1.8TB intel SSDs.

The enclosure is using a CSE-JBOD controller and passing the drives over to the main system using a Adaptec AEC-82885T powered using the Molex connector (not a board).

I have been doing some testing and was wondering if there is some best practices to follow when using an external JBOD with TrueNAS or if it should just be avoided altogether.

I wanted to ensure that if something were to happen to the connection between the TrueNAS and JBOD enclosure that data would not be lost.

I know that making a pool spread across multiple enclosures would be a recipe for disaster so I made a single pool existing of only the (24) drives in the JBOD enclosure.

After the zpool was created I wanted to test and see if it would survive something like the external enclosure powering off before the TrueNAS system and test pulling the power or the cables during a transfer.

When powering off the enclosure and then back on it appears that it immediately goes into a degraded state. The same happens if the SFF cable linking the enclosure to the TrueNAS box is pulled. I did this to simulate if something were to happen in the JBOD enclosure if the data would survive, ie, the CSE or raid card dies etc.

I have since added in a battery backup to the system and am in the process of configuring TrueNAS to shutdown in case of a detected outage. This will eliminate the danger of the enclosure losing power due to an outage but does not mitigate the risk of any failed hardware.

The first test resulted in a failed pool, but in my second attempt to duplicate, the pool degraded and after a reboot and scrub came back online healthy. I assume this is due to the first test writing data when the power was pulled vs the second happening during no activity?

Are there any settings which can prevent the array from degrading if it goes offline from the main node? Is there a specific export card that would be better than the Adaptec AEC-82885T?

Thank you

Do I understand correctly, you have 24 drives, and you have two stripes of 12 drives, and those are configured as two Mirrors?

If true, and I hope not, yes your data is at great risk. if you lose one drive from each stripe, your data is gone.

Hopefully it is 12 pairs of Mirrored drives and even striped together that is much better than the first thing I hope it wasn’t.

I don’t think anyone can make that statement.

There are folks here who have the proper experience to answer your questions.

Best of luck.

They are (12) pairs of mirrored drives.

I am hoping to minimize the chance of lost data of the JBOD disconnects. So far it seems like 50% change the raid failed.

As for the powering ON/OFF…

With the system powered off, are you powering up the JBOD enclosure first, waiting maybe 5 seconds (for the drives to all spin up, LOL, yes, wait 5 seconds even for SSD, they are not spinning up but it gives them a little time to come online and it’s good practice), then powering on the TrueNAS computer?

And with the system powered on, are you shutting down the TrueNAS computer first and then once it has shut itself off, only then are you turning the power off to the JBOD enclosure?

That is the proper method as I know it. I don’t know why you would have any dropped drives if doing it that way. But then again, I don’t have any experience with large systems like that.

Thanks Joe,

The problem is not properly starting up the system. My question and issue relates to the sudden disconnection or loss of power to the external JBOD. When booting up of course the JBOD will always be booted before the OS has time to boot and bring the pool online, I am looking for help in those rare but unexpected cases where an external factor causes the JBOD to go offline.

From my understanding zfs should survive power outages by design but it seems to have issues if the entire system does not lose power all at once.

What I am asking here is if there is any common practices or settings to use when using an external enclosure where this risk is present, ie settings etc.

I assume the issue I have experienced in testing is due to the TrueNAS system already being online when the JBOD enclosure comes online and brings drives online at slightly different times. Is there a setting for example where if a pool disappears, it will not bring it back automatically until all disks are detected?

Thanks

The ZFS format if used properly can prevent data loss. Even if you use ZFS but use it in a single stripe, the data can be lost from a single drive failure. Proper design is key.

In my experience the common practice it using the power On/Off sequence I depicted above. However there are data center experts here who may have a better answer.

To answer the question, I’m not familiar with a specific setting however there are a lot of settings “tunables” which do many things, that might be one of them.

The drives must be online first so the TrueNAS computer can recognize them. You should not be powering both on at the same time in my opinion regardless of the fact that the JBOD enclosure should be ready to go almost immediately, and I have been working, with various computer systems for decades, the external hardware is always powered on first or there is a designed delay manufactured into the hardware to ensure the external devices are online before the main computers load software and become online.

The most recent computer system I worked on (built in a lab, debugged, operated, troubleshot) had 32 individual computers, all networked using fiber, ethernet, and a lot of other wiring. The peripherals had another 35 computers, four very fancy fiber network switches, and of course four Ethernet network switches. Don’t forget the miles of wiring (no exaggeration) as that adds to the timing fun. At the end of it all, the capability to connect to, program, and launch missiles. Neat stuff actually. But I’m not a data center expert, I know my equipment and just some personal history from decades of personal computer and electronics use.

Maybe someone else can offer some advice. Right now I need to go pack up the daughters car so she can take a long drive back to her home.

Best of luck, hope you find what you desire.