Struggling with SAS speeds

Hello all,

Last year i bought myself a new server and installed truenas Scale on it. I bought new hba controller (9600W-16e) which was not supported yet by truenas scale and replaced it by a 9400-8e. Which solved the issue ( see Broadcom 9600W-16e not supported (yet)? | TrueNAS Community ) and i was able to connect my sas expander to my truenas scale. Since then i’ve been struggling with low speeds that can vary from day to day, sometimes it’s reading at around 100MB/s, sometimes around 200MB/s (using a huge file to test).

It has nothing to do with the server which is big time overdimensioned nor the network, because once the file has been copied and i copy it again, it easily maxes out my nic speeds and remains stable at max speeds (since it reads out from memory).

So i can pretty much narrow it down to the storage, either the hba controller or the jbod chassis.

If i run storcli on truenas it identifies:

  • 9400-8e with firmware version 17.00.00.00 using SAS-12G
  • the JBOD Enclosure ( AIC J2012-03-35X) using SAS-12G
  • the 5 WD Ultrastar DC HC570 22TB (SAS) as JBOD disks at SAS-12G (running in RAIDZ1)

I do see that the controller within the “supported adapter operations” says " Support jbod = no". Within the “Capabilties”’ section it does state it Supports SAS & SATA and also JBOD.

With the hardware i have i should be able to at least go way over 500MB/s, which my old server with truenas core and much worse hardware was easily capable of. So i’m really out of ideas how i can fix this. Does anybody have an idea what the reason could be?

How are you connecting the drives to the HBA? They are using LSI SAS35X40 chip in their expanders, and you should use SFF-8644 Mini SAS HD to connect the HBA to them. Are you sure the HBA itself is being properly cooled? What’s the output of sas3flash -list?
About the performance, which is the size of the files you used during these tests?

Hi Davvo,

Cooling is abundant in the server and in the JBOD, they both are like airplanes in my rack, so that is for sure not the problem, If i look at the temperatures they are all very low (i have BDM on both the JBOD as the server so i can easily read it out). I’m using SFF-8644 Mini SAS HD to connect them.

SAS3flash -list doesn’t work (it doesn’t find adapters). I think it’s not possible to use it on 9400+ HBA cards.

The Files i use for testing are typically 20GB+
On a sidenote, i don’t have the same slow speeds uploading to my truenas (write) but maybe that’s because it’s loaded in the memory first?

We really need to know more about the rest of the system - e.g. if you only have the five HDDs, then your performance numbers, though low, are not crazy.

1 Like

So you get 100 to 200 MB/s during read operations FROM the NAS and higher during write operations TO the NAS?

As far as I understand it should work. sas3flash -listall or sas2flash -list doesn’t show anything?

The server is a supermicro supermicro SYS-110D-8C-FRAN8TP:

  • Intel® Xeon® Processor D-2733NT (8 core - 16 threads)
  • 128GB RAM:4 x 32GB DDR4 2933MHz ECC REG RDIMM
  • Truenas BOOT = SSD D7-5520 4TB Pcie 4.0 X4
  • NIC used: SFP28 25 GbE LAN port(s) (Intel® SOC)
  • PCIe 4.0 x16 (used for the 9400-8e)

Truenas Scale is running directly on the server, so no virtualisation or anything.

I’m quite surprised those speeds should be normal for such a system. I was easily maxing out my 5gb nic on my old pc on my old truenas core server which had much worse hardware and slower disks (albeit 6 disks)

It shows for both that no avago sas adaptors found. And yes write operations are faster but i’m not suprised because i have 128GB ram in that server.

If i look at the HDD monitoring the max reads are capped at 40-45MB/s and the max writes around 95-100MB/s per disk.

Well, you should! It’s pretty hard to get faster writes than reads.
I would guess there is something going around with either the HBA/expanders or the cables.
Run jgreco’s solnet-array-test | TrueNAS Community and report the results please.

1 Like

I played a bit around with my JBOD chassis again. And i do notice (i’ve had that before) that all of the sudden i get max speeds from my disks. I have a dual expander that are both connected to the hba controller. I disconnected one and i’m only going over one cable instead of 2, rebooted the jbod and truenas and bam, full speed now. I’ve done that in the past to once and the next day it was fubar again.

Let’s hope that it stays stable this time :slight_smile:

update, not stable anymore, now it’s a jojo, up and down from 100MB/s up to 250MB/s. I don’t understand it because i didn’t change anything and all the temperatures are really low for all the components.

update 2, after another big file transfer 80GB, it started with the jojo for the first minute and now it’s maxing my network out again and it remains stable …

I am running the soltest, it seems to be a long test but i can already show this:

Yeah it’s at least a few hours long test. Anyway, I think you are dealing with heat issues if you are throttling.

1 Like

This test has been running for quite a long time now and my disks keep the max speed (i’m following it up in the reporting, so doesn’t seem to be a heat issue, also see below screenshots).

With this test it seems it’s not the JBOD, nor the hba controller if he keeps all disks running at max speeds (around 270MB/s each). If i interpret it right, the disk, sas expander and hba controller should be fully stressed by this and are running at max speeds, which should be 5 x 270MB. Or am i wrong?



Start moving some files around as you would normally. Not while the soltest is running.
Can you run:
zpool iostat -vvyl 60

It’ll sample for a minute and dump some text to the console for you to copy and paste back here.

I’m afraid the soltest will be running for almost a day with these 22TB disks lol. I’ll do it after the soltest.

Thx :wink:
Z

I can’t keep the console open long enough so the soltest stopped, but over the entire time it ran (hours) my average speed per disk was 260 to 275MB/s.

This should mean that the entire SAS (HDDs, expander, controller) is pushing over 1GB/s right?

Hi Nick,

this is the sample:

You shouldn’t use the console. Use an ssh session, and on top of that, a tmux session. The console closes to quickly.

2 Likes

Looks like most of the slowness is stemming from syncq_wait

Asyncq_wait and syncq_wait are, respectively, the time that data (sync or async) spends waiting to commit to disk in TXGs, and the time sync data spent waiting to commit to the ZIL. Neither of these columns include disk service time.

zpool-iostat.8 — OpenZFS documentation
OpenZFS: Using zpool iostat to monitor pool performance and health | Klara Inc (klarasystems.com)

Can you throw a -q on the end of that command and run again?

Also zfs get all Storage

this time with -q :slight_smile:
The second batch i read 2 big files and then it maxed out the connection on this computer (2.5gbe)

and the zfs get all storage:

Hmm. Seeing a bunch of pending reads. What protocol are you using to share? Are you moving around a large-ish quantity small files?

Can you do an iperf3 test between the client and the TrueNAS?

You’d do iperf3 -s on the TrueNAS and then iperf3 -c TRUENAS_IP on the client.