TrueNAS in a production environment

Hello all,

While looking for a NAS solution that allows running VM’s (Light IO load) we realised that TrueNAS may be a good option. Mainly because it would allow using our standard vendor (HPE) which would be beneficial from a hardware support standpoint. So, I’m looking at the DL20 Gen11 platform, or possibly something heavier (Like DL360, though hope that wont be required).
Disk-wise we are looking at a boot drive, two mirrored SSD drives for VM images, and 2-4 larger spinning disks for logfile storage (Mirrored).

Having read all the recommendations regarding having a specific (typically custom) HBA controller I seem to be stuck. From a business perspective working with custom flashed controllers is a no-go, for reasons of reliability, repeatability and support.

So, from what I can see there are three options left:

  1. Use any of the HPE controllers in HBA passthrough mode. These are Broadcom 956x controllers and according to HPE the passthrough mode exists specifically for deployments like this.
  2. Use “Direct Attached” disks. This basically means SATA only, in AHCI mode, using an onboard controller.
  3. Find an off-the-shelve HBA controller that can be installed within these HPE servers that would be fully compatible with ZFS.

The first option is by far the easiest. I must admit that I have never understood why such, rather expensive enterprise controller in HBA mode, would be not “HBA enough” for ZFS. I can also imagine that Scale, being based on Linux, is more flexible in what it supports (compared to BSD previously). Is there a way to “test reliability” through a test setup?

The second option would be severely limited in throughput. Thankfully we are nowhere near reaching 600MB/s sequencial speeds, and most important is random IO latency for VM use - small writes only. Is there a downside to going with AHCI?

Third is not preferred as it won’t be certified with HPE. This likely means issues when we need HPE to go fix a hardware issue at a site, or the servers themselves may behave ‘irrational’ and run at 100% fan speed as they may not be able to determine disk temperatures.
However, if we do want to go this way… is there any recommended SAS controller that would allow ‘True HBA’ without having to crossflash, from a reliable/enterprise vendor (Like Broadcom or similar)?

I’m fairly flexible on options, server platforms and such - but as this would be the opposite of a home-build, and we would need to set up many of these over the next few years … stability, reliability and repeatability are very important.

Your help is much appreciated!

Not sure where you got the “custom” part from, it’s certainly not true.

You’d think so, but Broadcom’s MegaRAID firmware team seems to think differently.

That’s not true in practical terms, since the recommendation always has been, and is likely to always be LSI/Broadcom SAS HBAs, specifically SAS3 these days. You could go with SAS2, but why? You definitely should not go with the unproven, completely reworked SAS4 generation.

Well, you’re limited to a handful of ports - especially once HPE gets their say (same goes for Dell or Lenovo). Beyond that, no.

LSI SAS 9300, 9400, 9500… Take your pick according to which cables you already have in the chassis.

Thanks Eric, much appreciated.

Perhaps I’m misunderstanding what an “IT Mode firmware” is, but I was under the impression that this is a firmware/option not officially supported / maintained by Broadcom (Or at least not for that specific card). Perhaps custom is not the right word, ‘unsupported’ is maybe a better description. Or am I misunderstanding IT firmwares?

I’m seriously curious in what way that is the case. Across the internet there are two very clear camps regarding this topic - including one saying that a raid card in HBA mode works fine with ZFS as that is its intended use.

If there is a problem using that method obviously I would not want to proceed with that option. But opinions seem divided … and I would hate giving up on this approach only because that has been the recommendation since forever. Obviously the MegaRaid cards on the market now are very different from the old style raid cards 10 years ago.

Not trying to discard your recommendation, however I am trying to understand the difference between MR-in-HBA-mode and a card that is only supporting HBA.

Is there anything tangible to test against? Certain SMART values/sensors/readings, a data stresstest, or anything of that nature? When/how would an issue show?

Regarding ‘SAS 4’; are you referring to a PCIe gen4 card? The Broadcom website seems to only list the 9500 series as active at the moment, but those are all PCIe 4.0 Tri-Mode (and match with HPE cabling using SFF-8654 SlimSAS ports).

The 9300/9400 series would probably work fine as we don’t run NVMe drives, but as those are no longer featured on the Broadcom website I’m guessing that those are on their way out?

IT firmware is quite literally what Broadcom ships on their HBAs, it’s 100% official and immediately available.

Well, there’s a relatively abstract concern that it’s going to be buggier, as it’s seen a lot less testing - Broadcom’s MegaRAID line has supported HBA operation since the SAS2.5 controllers were launched (with a substantial rework of the driver), but never really caught on for a variety of reasons.
The much more immediate concern is that performance is atrocious. Unbelievably terrible. I’m not even clear on why the hell it’s so terrible, but I’ve seen it for myself (albeit with Dell firmware, which is lightly customized). There’s no good reason for it.

Actually, they’re mostly the same. The whole SAS3 generation was a straightforward upgrade from the SAS2 line. The only major change was the tri-mode nonsense, which is a useless scam, starting with the 94xx cards.
The 96xx/SAS4 generation does represent a major change, with basically the whole stack being redone from scratch. While this presents opportunities for improvement, it also resets the clock on the many, many years of successful operations that the SAS3 cards (and SAS2 ones before them) have seen.

With modern MegaRAID cards, it’s basically just performance. Plus the potential for other bugs, but that’s hard to test for outside of a lab environment.

No, SAS4 controllers, namely the 96xx line.

Bizarre, but take as an omen.

Not a problem.

Ignore that part, as planned.

Technically they’re discontinued. In practice, they’re widely available used, as new pulls, and as new old stock.