Struggling with SAS speeds

Davvo · May 3, 2024, 4:45pm

That’s expected since it’s pulled from cache and not from the pool. Your network is ok.

Zormik · May 3, 2024, 5:09pm

Yeah, i know, that’s why i know it has something to do with the storage. But the weird thing is if i stress the disks on the server itself with soltest the controller and jbod don’t seem to be the issue as even with parallell it maxes out the individual speed of the disks.

NickF1227 · May 3, 2024, 5:19pm

Try using a 1M record size, you’re using 128K (default) right now. That should help with single stream large file copies.

Davvo · May 4, 2024, 12:23am

That’s true but it should not impact his performance this much. I think there is something wrong between the drives and the motherboard.

NickF1227 · May 4, 2024, 3:09am

~~Could be but still variables unknown. You can see the wavy lines in his file transfer speed. It may be some sort of physical problem causing TCP windowing. Have you tried another patch cable?~~

Edit:
Ehh maybe not hmm

@Zormik These are both READS from the TrueNAS to like a local SSD?

Zormik · May 4, 2024, 5:28pm

Yes they are.

Zormik · May 21, 2024, 5:21pm

I still haven’t found a solution. Single file speeds are crap. If i copy several big files at once i easily max out the network speed (around 6Gbit/s).

If i look at my reports on my truenas scale the speed of the single disks fluctuates a lot. What i can boil it down to most is what @NickF1227 said about the async_wait and syncq_waits are abnormally high.

I’m not experienced enough to have any clue what could cause the data waiting to commit to the ZIL.

P.S. I also did a copy directly trough the shell from the the disks array to an SSD (both on the same server) and it gave exactly the same result.

NickF1227 · May 21, 2024, 7:14pm

The only other advice I have would be trying to perform the test locally to an internal SAS HBA, removing the JBOD from the equation.

Zormik · May 22, 2024, 11:52am

I found the issue. I put in 3x10TB disks in RAIDZ1 on the same JBOD chassis and guess what 400MB/s read speed…

So i can pretty much narrow it down to the disks now…

Is there a way i could test if there is a faulty disk or can i try to rebuild raidz1 without having to copy everything over to another share (i have no other storage that can hold this much data)

MBILC · May 24, 2024, 8:43pm

If this is a production system, data is not important enough to be backed up somewhere? Or just not all data is backed up?

if you rebuild, you got to move data off or be sure to have a full backup.

simonj · September 16, 2024, 9:19am

Hi @Zormik
Did you ever fully solve this?
I’m in a similar situation. Quite beefy supermicro 32 core server 512GB ram. Latest Scale. 24x10TB seagate ironwolf as 12 x striped mirrors. Pool read speed seems to be capped at 250MB/s. Writes well over 1.5GB/s.
Tested with ARC off. local tests with FIO and remote over SMB show the same. Reading from ARC over SMB I get 15Gb/s on the 40GbE. So that doesn’t seem to be the issue.
No disk has SMART errors and there doesn’t seem to be a single disk standing out with strange values.
How did you solve it? Were you able to identify the bad disk? I would hate having to replace all 24HDDs.

Zormik · October 1, 2024, 9:53am

Hi Sorry, for the late reply.

No, i never was able to get it consistantly working. I have no erros on the disks and starting to replace disks on a 80TB pool is too much hassle for fixing a speedissue.

Sorry i can’t help you here.

mtron · October 3, 2024, 7:21pm

I have noticed with SAS drives, if the internal cabling or JBOD is wired for multipath SAS it will cause issues. It’s briefly mentioned in this thread.

https://www.truenas.com/community/threads/sas-multipath-support-on-truenas-scale-22-12-x.110847/post-766745