Last year i bought myself a new server and installed truenas Scale on it. I bought new hba controller (9600W-16e) which was not supported yet by truenas scale and replaced it by a 9400-8e. Which solved the issue ( see Broadcom 9600W-16e not supported (yet)? | TrueNAS Community ) and i was able to connect my sas expander to my truenas scale. Since then i’ve been struggling with low speeds that can vary from day to day, sometimes it’s reading at around 100MB/s, sometimes around 200MB/s (using a huge file to test).
It has nothing to do with the server which is big time overdimensioned nor the network, because once the file has been copied and i copy it again, it easily maxes out my nic speeds and remains stable at max speeds (since it reads out from memory).
So i can pretty much narrow it down to the storage, either the hba controller or the jbod chassis.
If i run storcli on truenas it identifies:
9400-8e with firmware version 17.00.00.00 using SAS-12G
the JBOD Enclosure ( AIC J2012-03-35X) using SAS-12G
the 5 WD Ultrastar DC HC570 22TB (SAS) as JBOD disks at SAS-12G (running in RAIDZ1)
I do see that the controller within the “supported adapter operations” says " Support jbod = no". Within the “Capabilties”’ section it does state it Supports SAS & SATA and also JBOD.
With the hardware i have i should be able to at least go way over 500MB/s, which my old server with truenas core and much worse hardware was easily capable of. So i’m really out of ideas how i can fix this. Does anybody have an idea what the reason could be?
How are you connecting the drives to the HBA? They are using LSI SAS35X40 chip in their expanders, and you should use SFF-8644 Mini SAS HD to connect the HBA to them. Are you sure the HBA itself is being properly cooled? What’s the output of sas3flash -list?
About the performance, which is the size of the files you used during these tests?
Cooling is abundant in the server and in the JBOD, they both are like airplanes in my rack, so that is for sure not the problem, If i look at the temperatures they are all very low (i have BDM on both the JBOD as the server so i can easily read it out). I’m using SFF-8644 Mini SAS HD to connect them.
SAS3flash -list doesn’t work (it doesn’t find adapters). I think it’s not possible to use it on 9400+ HBA cards.
The Files i use for testing are typically 20GB+
On a sidenote, i don’t have the same slow speeds uploading to my truenas (write) but maybe that’s because it’s loaded in the memory first?
We really need to know more about the rest of the system - e.g. if you only have the five HDDs, then your performance numbers, though low, are not crazy.
Truenas Scale is running directly on the server, so no virtualisation or anything.
I’m quite surprised those speeds should be normal for such a system. I was easily maxing out my 5gb nic on my old pc on my old truenas core server which had much worse hardware and slower disks (albeit 6 disks)
Well, you should! It’s pretty hard to get faster writes than reads.
I would guess there is something going around with either the HBA/expanders or the cables.
Run jgreco’s solnet-array-test | TrueNAS Community and report the results please.
I played a bit around with my JBOD chassis again. And i do notice (i’ve had that before) that all of the sudden i get max speeds from my disks. I have a dual expander that are both connected to the hba controller. I disconnected one and i’m only going over one cable instead of 2, rebooted the jbod and truenas and bam, full speed now. I’ve done that in the past to once and the next day it was fubar again.
Let’s hope that it stays stable this time
update, not stable anymore, now it’s a jojo, up and down from 100MB/s up to 250MB/s. I don’t understand it because i didn’t change anything and all the temperatures are really low for all the components.
update 2, after another big file transfer 80GB, it started with the jojo for the first minute and now it’s maxing my network out again and it remains stable …
This test has been running for quite a long time now and my disks keep the max speed (i’m following it up in the reporting, so doesn’t seem to be a heat issue, also see below screenshots).
With this test it seems it’s not the JBOD, nor the hba controller if he keeps all disks running at max speeds (around 270MB/s each). If i interpret it right, the disk, sas expander and hba controller should be fully stressed by this and are running at max speeds, which should be 5 x 270MB. Or am i wrong?
I can’t keep the console open long enough so the soltest stopped, but over the entire time it ran (hours) my average speed per disk was 260 to 275MB/s.
This should mean that the entire SAS (HDDs, expander, controller) is pushing over 1GB/s right?
Looks like most of the slowness is stemming from syncq_wait
Asyncq_wait and syncq_wait are, respectively, the time that data (sync or async) spends waiting to commit to disk in TXGs, and the time sync data spent waiting to commit to the ZIL. Neither of these columns include disk service time.