Any experts here, I need highly experienced suggestions I suspect re: performance oddities

diskdiddler · May 24, 2024, 3:57am

I have googled and found the odd highly specialised post from some of the long long term members from the old forum, I’m hoping some exist on here.

I’ll try to summarise in short lines as I tend to go on and on otherwise.

I have one large pool consisting of 6x16TB CMR disks, Z2
NAS has 0 faults or errors I’m aware of, has worked for years, this is my 10’th year on TrueNAS
Pool particular pool is only 6 years old, was re-created when I got this board in 2018
One particular filesystem / dataset is frequently really slow to work with, PARTICULARLY in regards to getting information about the contents of a folder / directory listing
Compression is off, dedup is off for this dataset
Regularly clicking my “P:” if I haven’t used it in a while will take from 2 to 8 seconds to produce a file listing.

So I just found an old (12 hour old) Windows Explorer window open inside a folder, on a mapped drive.
I found this window, clicked one level deeper on the folder structure, into a folder with NO special characters, a single video file in it, 300MB

To simply open this folder took approximately 10 seconds, a simple directory listing, over SMB to Windows Explorer

Once the ‘pool has woken up’ I can click around the filesystem quite quickly.
It is genuinely as if the drives ‘spun up’ after sleeping. (I do not believe this to be possible?!)

Can someone recommend to me how to really benchmark this baby properly so I can attempt to cleverly isolate the issue. I know how to perform a variety of benchmarks from years of messing with PCs but I feel like some kind of specialist test is in order here which will target something more specific.

Specs: Denverton 3758 (The 8 core?) 64GB (up from 32GB! I thought this might help)
Seagate 16TB disks which passed a plethora of extensive tests upon purchase, no SMART / ZFS / scrub faults.

diskdiddler · May 24, 2024, 4:02am

I will note the only interesting thing and I can’t be sure if it MEANS anything, I’m running a copy of a 120GB file from DATASET #1 to DATASET #2 (the bad one) on the same pool, obviously causing a disk thrash.

gstat -dp is showing me that generally ada2 and ada5 are slightly more busy than the others, it might be nothing as it’s fluctuating around, but they do seem to be consistently above 60% usage during this giant copy.

Bear in mind, I don’t fully know what I’m doing, I’m mid tier at best, but I’m wanting to identify this issue clearly without spending hours collecting and collating a variety of data when only 1 or 2 specialised tests might reveal the issue.

root@truenas[~]# zpool iostat 5 5
                capacity     operations     bandwidth
pool          alloc   free   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T    331      9  26.1M   474K
SSD1          2.44T  8.01T     38    120  1.54M  1.98M
freenas-boot  1.33G   236G      0      0  2.35K  2.06K
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  2.54K  2.40K   193M   280M
SSD1          2.44T  8.01T      0     83      0  1.05M
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  2.42K  2.52K   193M   296M
SSD1          2.44T  8.01T      7      0   358K      0
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  2.64K  2.84K   193M   289M
SSD1          2.44T  8.01T      0    104    818   916K
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  2.83K  2.80K   211M   335M
SSD1          2.44T  8.01T      0    105  1.60K   943K
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
root@truenas[~]# zpool iostat 5 10
                capacity     operations     bandwidth
pool          alloc   free   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T    331      9  26.1M   479K
SSD1          2.44T  8.01T     38    120  1.54M  1.98M
freenas-boot  1.33G   236G      0      0  2.35K  2.06K
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  2.46K  2.71K   214M   319M
SSD1          2.44T  8.01T      8    130  61.5K  1.02M
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  2.17K  2.22K   176M   253M
SSD1          2.44T  8.01T     59      2   866K  43.1K
freenas-boot  1.33G   236G      2      0  7.87K      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  3.61K  2.84K   218M   341M
SSD1          2.44T  8.01T      2     88  62.3K   661K
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  3.62K  2.77K   213M   340M
SSD1          2.44T  8.01T     19     79   281K   562K
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  3.62K  2.43K   207M   289M
SSD1          2.44T  8.01T      0    169  19.1K  1.27M
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  4.01K  3.07K   229M   373M
SSD1          2.44T  8.01T     28      2   343K  43.1K
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  4.11K  2.73K   231M   318M
SSD1          2.44T  8.01T      0    217  25.6K  1.43M
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  3.23K  2.50K   198M   287M
SSD1          2.44T  8.01T     25     87   294K  1.62M
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.7T  41.5T  4.39K  2.91K   233M   377M
SSD1          2.44T  8.01T      4    207   121K  1.57M
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
root@truenas[~]# zpool iostat 15 5
                capacity     operations     bandwidth
pool          alloc   free   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.8T  41.4T    332     10  26.1M   506K
SSD1          2.44T  8.01T     38    120  1.54M  1.98M
freenas-boot  1.33G   236G      0      0  2.35K  2.06K
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.8T  41.4T  3.53K  2.64K   209M   310M
SSD1          2.44T  8.01T     74     83   655K   779K
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.8T  41.4T  3.66K  2.56K   204M   304M
SSD1          2.44T  8.01T     16     97   403K   939K
freenas-boot  1.33G   236G      7      0  40.6K      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.8T  41.4T  3.23K  2.53K   199M   297M
SSD1          2.44T  8.01T     18    114   208K   941K
freenas-boot  1.33G   236G      0      0      0      0
------------  -----  -----  -----  -----  -----  -----
ARRAY         45.8T  41.4T  3.12K  2.62K   206M   307M
SSD1          2.44T  8.01T     12    110   345K   936K
freenas-boot  1.33G   236G      0      0    136      0
------------  -----  -----  -----  -----  -----  -----
root@truenas[~]#

(it took 9.5 minutes, to copy 120GB from M: to P: - same pool)
(it took 10.1 minutes to copy 120GB from P: back to M: - same pool)
NOTE: this was via ssh command at the filesystem level, not explorer, not SMB.

diskdiddler · May 24, 2024, 4:09am

Finally, I believe the tool I need may be fio? However it’s particularly complicated and powerful, so I’d be curious to know what would suit it best.

Apollo · May 24, 2024, 6:20am

I am assuming this is a Windows 11 PC that is slow?
I think this is a Windows (Microsoft) deficiency.
A lot to do with Microsoft collecting user files details which are mostly/only pertinent to Microsoft.
See if any of the caching… is enabled on you Windows PC.
For instance, if I have a mounted SMB share which is offlined, Windows will crawl to a stagering slow when using exploroer.

diskdiddler · May 24, 2024, 7:15am

This occurs on both Windows 10 and Windows 11 and has been occurring for over a year.
Furthermore the issue is only applicable to one particular dataset, whereas the others, even at their slowest are at least 2 or 3 times more responsive.

It just took 7 full seconds, to open a folder on that dataset with 434 items in it.

Constantin · May 24, 2024, 10:23am

I’d start by looking through some of the browsing suggestions compiled in this thread started by Cyberjock in 2015. Some of the suggestions may be out of date by now or mitigated by features that were added to TrueNAS since then like sVDEVs. That said, they’re still worth looking through.

Protopia · May 25, 2024, 2:21pm

I would start by seeing how responsive a directory listing is from the TrueNAS console i.e. is it a disk performance issue or a SMB network performance issue?

Also, I would check Reports to see what your cache hit ratio is.

diskdiddler · May 27, 2024, 2:25am

I agree with you, I’ve been meaning to test this.

I just navigated via CLI to the root of the mapped drive, typed ls and she came up in a second.

Then tried explorer and it was quick.

Now I’ll leave her for a long time, try explorer first instead (and I expect it to be slow) - see if I can nail where this is all going wrong.

Captain_Morgan · May 27, 2024, 3:29am

10 Minutes for 120GB = 12GB/minute = 200MB/s. That’s 50MB/s per data (HDD) drive… that’s very good.

ada1 on third table is at 77ms per Read. but only 4kBps? I guess its only reading metadata and nothing is cached. Is the file a small or larger record size?

diskdiddler · June 4, 2024, 8:36am

Sorry could you elaborate on this?

If I recall, I was copying a 110GB file in order to produce these results, I always assumed the data was (kinda) evenly distributed across all 6 disks in a Z2 config but it seems that may not be the case.

Stux · June 4, 2024, 1:15pm

Certainly sounds like slow metadata

It’s fast once you list the directory on the server?

metadata devices may help.