The OP hasn’t mentioned how full the pool is either. If the pool is over 80% full then performance would crater.
Another good point is the potential for SMR disks in that pool. They too would bring it to a crawl.
But dedupe is the most likely culprit.
Given what the pool is supposed to be used for, a sVDEV with a bunch of enterprise quality SSDs could really speed things along, a SLOG could help too. Ditto a bunch of mirror VDEVs.
HGST Ultrastar DC HC520 (He12)
Device Model: HGST HUH721212ALE604
Capacity at present 500GB/130TB
No VMs, isos and templates only.
What I don’t fully understand, is this is a stability not performance related issue, how the system goes non responsive for hours, and nothing is using the NFS exports. As I suggested in the OP, I was surprised to find no cache for this ZFS based system etc
Sadly, there may not be any budget to provide SSDs for use with this server, so, I’ll destroy the pool, and re-create with no DeDupe and Compression and see if this is stable,otherwise a new OS is required.
If the pool can be nuked, and you want to use it for VMs, then I’d follow the advice re: using striped mirrors, not Zx VDEVs for that use case. For general purposes, a pool consisting of a couple of Z2 VDEVs should fly at several hundred MB/s unless you’re dealing with a lot of small files.
TrueNAS doesn’t use a fast front-cache system to file stuff later the way some SSDs and SMR HDD do it. The best way to implement that is either a SSD scratch pool or implementing a SVDEV where the dataset in question is designed to reside solely in the SSD sVDEV pool by designating all files in it as “small files”. Svdevs are a great tool but have to be handled with great care. Given your experience level with TrueNAS, I’d likely avoid implementing one.
A fast Optane SLOG could help a bit, as could a persistent, metadata-only L2ARC. The former will be expensive (if you want a fast one), the latter can be done with just about any single SSD (mostly reads + a few writes, is redundant so L2ARC failure will not affect the pool).
So thanks for the comments, so far. Which is a poorly configured vendor supplied NAS!
Oddly, stability seems to have improved and NOTHING has changed.
Anyhow we are copying the current data off, to reconfigure the storage.
So if budget is not forthcoming, and we are stuck with what we have which is
16 x HGST Ultrastar DC HC520 (He12)
Device Model: HGST HUH721212ALE604
for the storage pool, as the OS is installed on SSDs
What is the best configuration we can expect, if we want to use for ISO, Templates and VMs (this are not requriing production workloads, we have vSAN for that)
Well, a big question is what sort of VMs? A handful of small servers that need to store their boot device and little else? Or serious workstation sort of things that will be doing tons of I/O to disk?
nothing strenuous, all that is on vSAN, simple Linux and Windows OS (Server) not VDI or workstations. This is just a scratch box really it’s cheap storage when compared to Pure, NetApp, vSAN, All flash arrays, somewhere to store ISOs and Templates, and some low resource VMs.
Just OS Boot
e.g. student wants to spin up Ubuntu Server for test