Good day to everyone! I am running into a bottleneck mainly on writes and can’t seem to isolate it down and looking for help and guidance. I have read many and I mean many forums and spent the last few weeks and 10’s of hours investigating this and losing my hair fast. Here’s my setup:
- ZFS + Ubuntu builds, 128K record size:
- Build #1: Cisco UCS C220-M4, 2x Intel Xeon 2.5Ghz 12 core, 32GB RAM 2133Mhz, 2x 120GB SATA in RAID-1 for Ubuntu (SLOG and L2ARC but different partitions), 1 vdev 6x 960GB Samsung SSDs
- Build #2 and #3: Cisco UCS C240-M4, 2x Intel Xeon 2.4Ghz 14 core, 256GB RAM 2400Mhz, 2x 480GB SATA in RAID-1 for Ubuntu (SLOG and L2ARC but different partitions), 2x vdev 11x 1.8TB 10K Seagate Enterprise drives
- TrueNAS build, 128K record size:
- Cisco UCS C240-M4, 2x Intel Xeon 2.4Ghz 6 core, 128GB RAM 2400Mhz, 2x 120GB SATA in RAID-1 for TrueNAS (managed by TrueNAS not RAID controller), 2x vdev 11x 1.8TB 10K Seagate Enterprise drives, 2x 960GB Samsung SSDs for SLOG (mirror) and L2ARC
- Network setup for several hundred VMs:
Cisco B-200 servers running ESXi 7.0 U3n → Cisco 6248 FI’s → Nexus 5548 → 2x 10G to ZFS/TrueNAS storage
- vCenter to manage all ESXi hosts
- iSCSI to storage with multi-pathing enabled including round robin to 1 IO per path change instead of the default of 1,000, each iSCSI path has its own dedicated VLAN and subnet
- Reads are super fast with all the ARC and L2ARC, so my issue… of the 4 builds listed above I am having almost the same “real-world” write throughput limit in my virtualized environment where I max out at ~55MB/s on write bandwidth. This value is confirmed by taking 55MB/s and subtracting it by the average write bandwidth from command “zpool iostat -v” for the entire pool; any new file transfers to the storage in question or file unzipping/uncompressing will get the remaining throughput. For example 55MB/s - current average of 40MB/s == 15MB/s on a file transfer/unzip, lower current average will allow faster file activity while higher average will result in slower file activity. I did a check with command “zpool iostat -pr” and can see my sync and async writes are ONLY from 4k - 1M blocks and no smaller or larger. I have tested this against Windows and Linux virtual machines and no difference. What really is confusing is with an ALL flash storage as mentioned in build #1 with 1x vdev and 6x 960GB SSDs is getting the same performance as the other 3 builds.
- I have removed SLOG from SSDs and allowed ZIL to be on the spinning disks, no change
- I have moved storage to 6248 FI’s bypassing the Nexus 5548’s, no change
- I have removed an ESXi host from vCenter and seen a 30% increase in writes to 71MB/s but I think its related to vCenter causing additional delays on cluster sharing of resources or even DRS, so in vCenter I did try to crank the storage IO limit from 100k to 500k and no change there either
- ATTO testing with 256 queue depth confirms multipathing reads and writes can easily saturate 2x 10Gbps links with 64k and higher block size, any lower gets a mix of a few hundred MB/s but still tons of IO/s
- I even built a single vdev with 4x 1.8TB Seagate drives, I would have imagined this would have really hurt write performance - but no change
- I have tried different records sizes of 32k, 64k, extent block size should be 4k if I’m not mistaken with ESXi 6.7 and later and its 4k minimums but did try 512 - 4k and again… no change
- I did try different metadata sizes 32k - 1M but no change
Would love to know what troubleshooting steps I could take to find the source of this issue or if there’s a best practice for this sort of setup where I may have overlooked a simple setting in ESXi or ZFS/TrueNAS. If you made it this far - THANK YOU and look forward to chatting!