Ok, this is going to be long winded, and more then likely I’m just overthinking things, but I would like some input and recommendations on this.
Because of the number of files housed on my TrueNAS box (7M+), as well as amount of data (30TB+), I’m planning on redoing some settings, as well as possibly adding a sVDev primarily focused on housing the metadata vs small files (increasing the % of metadata on the sVDev) to help reduce backup times for my pool (making a 2nd TrueNAS strictly to mirror [backup] the primary NAS).
After reading through special-vdev-svdev-planning-sizing-and-considerations, I’m still unsure what recordsize I should use that would work best on my system. The data is almost 100% fixed data, no VM or databases running off the NAS, so I shouldn’t need a small recordsize, but because of the number of “small” files in the mix, I also shouldn’t use too large a recordsize? ![]()
This is my what ZFS outputs for my data with a default 128k recordsize:
Block Size Histogram
block psize lsize asize
size Count Size Cum. Count Size Cum. Count Size Cum.
512: 13.3K 6.65M 6.65M 13.3K 6.65M 6.65M 0 0 0
1K: 445K 460M 466M 445K 460M 466M 0 0 0
2K: 41.0K 102M 568M 41.0K 102M 568M 0 0 0
4K: 5.55M 22.5G 23.1G 136K 889M 1.42G 0 0 0
8K: 3.09M 35.3G 58.4G 1.64M 19.6G 21.0G 537K 4.19G 4.19G
16K: 1.19M 25.1G 83.5G 1.16M 20.0G 41.1G 7.84M 134G 138G
32K: 2.33M 108G 192G 5.49M 180G 221G 2.99M 117G 255G
64K: 5.48M 513G 705G 295K 27.7G 249G 4.19M 396G 650G
128K: 236M 29.6T 30.3T 245M 30.7T 30.9T 239M 37.3T 37.9T
256K: 0 0 30.3T 0 0 30.9T 289 96.4M 37.9T
512K: 0 0 30.3T 0 0 30.9T 0 0 37.9T
1M: 0 0 30.3T 0 0 30.9T 0 0 37.9T
2M: 0 0 30.3T 0 0 30.9T 0 0 37.9T
4M: 0 0 30.3T 0 0 30.9T 0 0 37.9T
8M: 0 0 30.3T 0 0 30.9T 0 0 37.9T
16M: 0 0 30.3T 0 0 30.9T 0 0 37.9T
This is also the breakdown of file sizes in my pool, Logical = number of file actual filesize, Physical = number of files within that blocksize:
Size Logical Count Physical Count
0-512 11234 4282
1K 4258 0
2K 8363 0
4K 14104 0
8K 187497 40934
16K 1736694 1034971
32K 359377 1145562
64K 308586 355686
128K 302034 313125
256K 634766 600854
512K 1108517 1133355
1M 1142061 1171683
2M 749231 764100
4M 441514 444363
8M 279276 279415
16M 151689 151501
>16M 141638 141008
If this last chart is confusing, using the 8K line as an example:
There are 187497 files whos actual (Logical) file size is between 4KB+1 to 8KB.
There are 40934 files that use 8KB block of Physical space in the pool.
If I am reading everything correctly, going by the Block Size Histogram, I have 37TB+ of data mainly in the 128K recordsize range, so it seems a larger recordsize would be better. But going by the filesize breakdown, I have 2M+ files out of less then 8M total, that use less then the current 128K recordsize. So if I go too large, would I get poor performance and too much wasted space when dealing with those small files? ![]()