As I previously commented, this statement really depends on what you are trying to measure.
When you use dd to read a large file, you have only a single thread requesting data, and it will do so sequentially. This will trigger sequential read-ahead which will read a few blocks ahead of the one last requested. Throughput on a single disk vDev will depend on the slowest of disk throughput or a single-core CPU dd process itself (likely to be disk). However, in a mirror, you need to know how these disk reads will take advantage of the mirrors to do the reads in parallel. Will sequential read ahead alternate between disks or not?
The reason I suggested that you need multiple read streams is to avoid these needing to know these types of answer. If you have two or more dd streams in parallel, then you are much more likely to hit read throughput as the constraint, keep the disks running at 100% utilisation, as the various dd instances will run in parallel on multiple cores.
When you use dd to write a large file, then all writes are made to both drives, so parallelism is less important to get both drives running at 100%.
In both cases, I think you should look at disk stats (utilisation, throughtput) rather than dd stats.
As I have said previously, my input is based on previous general performance testing experience and not on specific performance testing of Debian or TrueNAS.
But if you are currently doing a single dd command reading a single file running normally in a shell, then all you need to do is to copy the file without block cloning so you have a second copy and from two shell windows run a dd command against each of the files.
You also need to use the Disk I/O reports to get the throughput measurements and not use the dd throughput. If you start with one dd command and keep increasing the number of parallel dd commands until the graph stops increasing you will likely have found the maximum throughput.
Sorryā¦ But Iām not following what you mean with āwithout block cloningā and running against the two files from two shells. If you can provide some clear steps it would help a lot!
Is this test representative of how TrueNAS would do this with a single SMB user requesting to read a large 195 GB file? Maybe this is testing if multiple users would read different files (not my usecase)?
@nvs if you are not so comfortable using shell commands you can also use AJA System Test on a client. I use that a lot. Important to disable compression as AJA writes compressible data . Or for fully real-world tests write and read video files using Davinci Resolve. Observe what your disks read and write with zpool iostat -v 1 . Or you can start a scrub and watch the read speed with zpool iostat.
Another hint and I really donāt mean it in an offensive way. A lot of your questions on how to use the commands for benchmarking could be answered by ChatGPT. I use it a lot. Just much faster than keeping a cheatsheet or googling, stackexchange, etc.
No problem using shell here but being two weeks into testing I want to be absolutely clear on applying the correct steps/commands to be correct and in line with other users tests/to be useful and not having to do it again. As said, I will only run this fio test tomorrow still and then take the machine back into normal operation and that will likely be the end of this adventure for me. So this is the last opportunity to do this correctly on this system tomorrow.
Appreciate the hint on ChatGPT, but I have seen that produce not always correct output in the past (actually quite often) when it comes to similar things. And at the stage where I am, I want to be absolutely sure we are on the same page on doing identical testing/applying the correct commands. I hope you get my point.
I just have a comment. Why are you using numjobs=1 for fio? When IX says read performance will scale with vdevs, do you think they mean with one process only? I would think it means on a normal system that is operating, and a normal system has many processes doing many things. Not 1 job.
I have not read the whole thread. My other comment is just wanting to make sure when you add a vdev, you rebalance the pool as otherwise, everything is still on the one vdev that was there before adding.
Hi @nvs or @simonj can you please test the read speed with prefetch on and off. Because I remember a very old thread in another forum (canāt find it any more) it had a huge influence of 2 vdev 4 drive mirror read speed. Iām not a command line expert.
I am explicit about where the test files are put, different directories for different parallel jobs, no filename so that it can create 5 files for 5 jobs. You need to change the pool and the directory names and create the directories it needs before running this.
EDIT: On reflection /mnt/hdd-pool/disktest should probably be a separate dataset so that you can see dataset parameters such as compression to off.
Thanks! Just for total clarity, these fio commands I should run after each other now (not in parallel as suggested before), correct? And while running them, towards the end of each fio test note down the output of the iostat command.
I donāt think prefetching makes sense with parallel jobs. Not sure. @Protopia can bring light in the dark showing his test resultst. My estimation is prefetching with a single job is efficient, with multiple jobs perfomance drops.
The numjobs=5 runs 5 tasks in parallel for you. So yes - in parallel as suggested before.
Easy to to a matrix of runs, so I did it with pre-fetch. It seems to give a proportionately similar boost with 5 jobs as it did with 1. I am assuming that 5 jobs has worse performance because of seeks.
But I have no idea why prefetch is still beneficial when I do this.
Reviewing the netdata stats for ZFS ARC, there were a LOT of ARC hits during this period - either c. 50% or c. 75% depending on which graph you look at. Whilst this is much lower than the 99%+ I get in normal usage, it could simply be the metadata being reused repeatedly. However I am not fully convinced that setting primarycache=metadata cleared existing ARC cache or prevented new data caching.
EDIT: Later graphs separated out metadata and data ARC rates and these clearly showed that data caching was off, and also you could see when pre-fetch was on. So I now think the prefetch/caching commands worked as expected and that the measurements are therefore done right.
But imagine a production environment with many things going on, and multiple mirror vdevs (say 3), with blocksize 1m. Imagine files that fit in one block, so, not streaming. So, some files might come from vdev1, some might come from vdev2, some might come from vdev 3, all reads do not start from a vdev 1 of course. And within those, least busy drive (itās not round robin) will handle the read. So, in that case, you get far more performance benefit (measured) than say a single simple media stream only. It all depends what exactly you are measuring. The comment about multiple vdevs getting twice the performance, you may or may not get twice, doubtful. But it all depends on settings, workload, etc, you could.
These tests here are more believable in my book, and you have already mentioned it:
He didnāt get 2x, but again, depends on workload.
Prefetch is of course on by default, but does not prefetch everything, there is logic to it. I would leave it on in most cases.
If you are taking the IX document literally, then, it is not what is going to happen, at least with the tests you are doing. Real worlds tests show as you do. I donāt believe you are finding anything wrong at all, but what is expected.
This is not a general performance issue we are investigating - it is a very specific issue with mirrored drives where a mirror should perform reads better than a single drive, and it doesnāt.
To prove that there is an issue we are trying to get to the simplest situation where we can demonstrate this, and that is a single vDev with one drive or 2 or more mirrors.
Please letās stay focused on proving this so that we can hand it off to a support team who can reproduce the issue and so will take it on to diagnose the cause and create a fix.
P.S. My own performance measurements were provided only as an incidental - I donāt have a mirror on my system nor any spare drives and so I cannot test the specific issue. I was simply providing a script and someone then asked to see my measurements.
Ok, I read an awful lot about what I was commenting on here, but know that you are on it. I do have mirrored single vdev, so if you end up with any commands you want to compare to one of the contributors having the issue, let me know, be happy to also test for comparison purposes.
Hi @sfatula, we had a long discussion here, but unfortunately, we spent a lot of time with someone without the possibility to check it with real HW. Anyhow, he seems convinced that the problem exists.
We are discussing here the performance of the 2 vdev 4 drive mirror read speed and mirrors generally. Please help!
Following your posts, you are able to check them on real HW, and additionally, you are not believing the problem exists. So if you can repeat it, we are one step further.