Stability Isues with TrueNAS Core server

Has potential to be a nice rust system.

If you want to host VMs, you’d want to reformat as a stripe of mirrors and add a decent SLOG device.

Otherwise, it’s probably better to go with a stripe of 2 six way raidz2

Do not use dedup. It appears this hardware is very far from the requirements to have it work well.

The OP hasn’t mentioned how full the pool is either. If the pool is over 80% full then performance would crater.

Another good point is the potential for SMR disks in that pool. They too would bring it to a crawl.

But dedupe is the most likely culprit.

Given what the pool is supposed to be used for, a sVDEV with a bunch of enterprise quality SSDs could really speed things along, a SLOG could help too. Ditto a bunch of mirror VDEVs.

not at present, ISOs and Termplates only.

HGST Ultrastar DC HC520 (He12)
Device Model: HGST HUH721212ALE604

Capacity at present 500GB/130TB

No VMs, isos and templates only.

What I don’t fully understand, is this is a stability not performance related issue, how the system goes non responsive for hours, and nothing is using the NFS exports. As I suggested in the OP, I was surprised to find no cache for this ZFS based system etc

Sadly, there may not be any budget to provide SSDs for use with this server, so, I’ll destroy the pool, and re-create with no DeDupe and Compression and see if this is stable,otherwise a new OS is required.

Dedup is just that bad, to the point of causing things that depend on storage to just die of boredom due to I/O timeouts.

1 Like

Compresssion is fine, lz4 or zstd (i believe)

1 Like

I don’t think it’s an OS issue, or ever was.

If the pool can be nuked, and you want to use it for VMs, then I’d follow the advice re: using striped mirrors, not Zx VDEVs for that use case. For general purposes, a pool consisting of a couple of Z2 VDEVs should fly at several hundred MB/s unless you’re dealing with a lot of small files.

TrueNAS doesn’t use a fast front-cache system to file stuff later the way some SSDs and SMR HDD do it. The best way to implement that is either a SSD scratch pool or implementing a SVDEV where the dataset in question is designed to reside solely in the SSD sVDEV pool by designating all files in it as “small files”. Svdevs are a great tool but have to be handled with great care. Given your experience level with TrueNAS, I’d likely avoid implementing one.

A fast Optane SLOG could help a bit, as could a persistent, metadata-only L2ARC. The former will be expensive (if you want a fast one), the latter can be done with just about any single SSD (mostly reads + a few writes, is redundant so L2ARC failure will not affect the pool).

Exactly what I was trying to to subtly sniff out to see if there was even more insult added to injury of the poor configuration.

And I mean performance can absolutely impact stability if the performance is just that horrible.

So thanks for the comments, so far. Which is a poorly configured vendor supplied NAS!

Oddly, stability seems to have improved and NOTHING has changed.

Anyhow we are copying the current data off, to reconfigure the storage.

So if budget is not forthcoming, and we are stuck with what we have which is

16 x HGST Ultrastar DC HC520 (He12)
Device Model: HGST HUH721212ALE604

for the storage pool, as the OS is installed on SSDs

What is the best configuration we can expect, if we want to use for ISO, Templates and VMs (this are not requriing production workloads, we have vSAN for that)

Well, a big question is what sort of VMs? A handful of small servers that need to store their boot device and little else? Or serious workstation sort of things that will be doing tons of I/O to disk?

nothing strenuous, all that is on vSAN, simple Linux and Windows OS (Server) not VDI or workstations. This is just a scratch box really it’s cheap storage when compared to Pure, NetApp, vSAN, All flash arrays, somewhere to store ISOs and Templates, and some low resource VMs.

Just OS Boot

e.g. student wants to spin up Ubuntu Server for test

famous last words!

NFS just stopped today!

All hosts had the export mounted, and just lost access to datastores.

Checking NFS 4.1 not responding, nothing in any of the logs as to why./

Trying to stop and start NFS hangs. All networking is fine and okay, no jumbo frames and this is on the older 1GBe.

So the initial problem we were trying to solve, has returned.

No access was being performed to the storage.