Does a pool of mirrored VDEVs benefit from an SLOG?

Lobanz · May 1, 2024, 3:13pm

I assume pools composed of mirror VDEVs are still copy-on-write and therefore maintain a ZIL, right? Therefore, it could benefit from an SLOG. Right?

My Tentative Plan: Upgrading homelab from single TrueNAS SCALE server. Have a new-to-me 12 bay server with 12x identical 4TB SAS HDDs. Using TrueNAS SCALE for general storage and VM storage for a separate XCP-NG server. 25G (maybe 10G) network between XCP-NG and TrueNAS. Have an Optane 900P PCIex4 SSD (280GB) for an SLOG. Thinking about 5x 2-drive mirrors with 2 hot spares – so any two drives can fail, right? I have 4 identical drives on the shelf in caddies. Other pool configs might be 2x 6-drive raidz2 VDEVs or 1x 12-drive raidz2 VDEV. The XCP-NG server is a new-to-me 8-bay with 8x of the same 4TB drives but will use the main truenas for all VM storage, but will run a virtualized TrueNAS (HBA passed through) to manage a back up replica of the main TrueNAS (probly 8x raidz2) . XO would run in a VM on the main truenas scale server. The replica truenas would have slightly more usable space than the main server.

Any advice welcome.

pmh · May 1, 2024, 3:22pm

Depending on the workload - yes.

No? If by accident two drives in the same mirror pair fail, the pool is gone. Any vdev fails and your pool is toast.

As for the SLOG - if your XCP-ng uses synchronous writes, then yes, it can profit from an SLOG.

Lobanz · May 1, 2024, 3:29pm

Yikes. Yeah. Didnt think of that. Might can live with that tho. I guess they’d have to die at basically the same time – before the hot spare could be put into place. Right?

pmh · May 1, 2024, 3:34pm

Right, but the resilver process might take a couple of hours …

Lobanz · May 1, 2024, 3:47pm

OK. Given I have a fast SLOG, this is making me reconsider 2x 6-drive raidz2 vdevs – any 2 drives can fail in either vdev. Having 2x 6-drive raidz2 vdevs should give me a little more write performance over 1x 12-drive raidz2, right?

etorix · May 1, 2024, 4:06pm

Mirrors are generally recommended for VM (block storage) due to better IOPS and better handling of small writes. ~~But you have SSD: IOPS performance is high to begin with (and resilver would be short); that leaves small blocks.~~
Go for 5-6 mirrors with out without hot spares, and the Optane SLOG.

Edit. Oops! Badly misread the original post.
(But I got to learn that “strikethrough” is “wrapped in double tildes”.)

Rand · May 1, 2024, 4:14pm

Those are not SAS SSDs… so unlikely to outperform a 900p
Even if it were SSDs, it very much depends on the exact drives and the workload. Low threads/qd would still be better on the Optane, for higher values it would remain to be seen.

A 900p is unlikely to exceed 10G in sync writes unless your really use it heavily with many parallel write processes

Lobanz · May 1, 2024, 4:34pm

I hear ya and that was my original thought. Been doing more reading. I really dont like the idea of whole pool being vulnerable to a single drive failure during the resilver process. Especially with my 8 year old used SAS drives. Sure they are HGST’s rated at 2M hour MTBF, but still. Even if they were new, that’d bother me. Was almost convinced on the mirroring until reading more on that. I understand the flexibility of mirroring too, but I’m totally willing to sacrifice a little performance for a lot of resiliance – and get a little more usable storage at the same time with 2x raidz2.

Lobanz · May 1, 2024, 4:41pm

Good to know. Was unsure. These are Dell R720’s servers (one is an R720xd), and the 25G Mellanox C4 dual port daughter cards were $20/ea, I guess because they only fit Dells. I also have 2x Intel X520 10G SFP+ regular NICs. Have DAC cables for both 25G and 10G. One or the other will work out. They are just gonna be NIC-to-NIC, no switch, for storage traffic only. I also have a couple of Intel 4 port 1G NICs that will go to my switch for regular access.

Lobanz · May 1, 2024, 8:42pm

So, yes, mirror vdevs can benefit from SLOG. And regarding my other plans…

ZFS Raidz Performance, Capacity and Integrity

Just rediscovered this article. He’s using 4TB drives similar to mine, and the same controller chipset, and did a reasonably scientific test comparing different pool layouts. Here are the 12 disk configurations (no slog, no l2arc, no compression).

ZFS Raid Speed Capacity and Performance Benchmarks
                   (speeds in megabytes per second)

...
12x 4TB, 6 striped mirrors,    22.6 TB,  w=643MB/s , rw=83MB/s  , r=962MB/s 
12x 4TB, 2 striped 6x raidz2,  30.1 TB,  w=638MB/s , rw=105MB/s , r=990MB/s 
12x 4TB, raidz (raid5),        41.3 TB,  w=689MB/s , rw=118MB/s , r=993MB/s 
12x 4TB, raidz2 (raid6),       37.4 TB,  w=317MB/s , rw=98MB/s  , r=1065MB/s 
12x 4TB, raidz3 (raid7),       33.6 TB,  w=452MB/s , rw=105MB/s , r=840MB/s 
...

These results indicate that a pool of 6x 2-drive mirror vdevs is roughly equivalent in performance to a pool of 2x 6-drive raidz2 vdevs. And both of those showed superior performance to a pool of 1x 12-drive raidz2 vdev, except in reads. But as @pmh pointed out, the mirror pool layout is vulnerable to loosing the entire pool if the second mirror drive fails during resilver.

So, I think my mind is made up. I’ll go with a pool of 2x 6-drive raidz2 vdevs and the Optane SLOG. The server has 256GB ram (and 20/40 2.8GHz cores) so that’s a pretty good ARC. The replica server will just be 1x 8-drive raidz2. I’ll just have to make sure I don’t fill up the replica since he doesn’t have quite as much usable space as the main system does.

Thanks guys!

Rand · May 2, 2024, 6:00am

Those tests might not be 100% applicable to your use case since

depending on your workload (Nr of VMs) your IOPS requirement might be much higher (more than 4 concurrent writes, mixed rw acces etc)
The slog helps your case (potentially converting random into [more] streaming writes)

Again, this very much depends on your workload (Nr VMs. activity in vms, lots of reboots or not).

Just run some tests before your commit to a final solution. Or build it out, try and if its not working just recreate with help of your backup system

Good luck.

Lobanz · May 2, 2024, 3:06pm

Yes, since these two servers have yet to be put into use, I can play with various configurations. Definately gonna to that. Mostly interested in VM storage over NFS, so that’s likely limited by random sync write performance. We’ll see how much the SLOG masks the underlying write performance (or lack thereof) of the underlying pool configuration. Once everything is all migrated and happy, I’ll sell the existing server.

Rand · May 2, 2024, 4:15pm

If you happen to be handy with perl you could use a script I created a few years ago.

No idea if it still runs so might need adjustments, and it needs some manual in script configs, but it was made exactly for a situation like yours -

take n disks and m layouts and run a million tests on them until you drown in data;)

Whattteva · May 2, 2024, 9:02pm

Its worth noting that the device you intend to use for the SLOG matters a lot, especially if you’re trying to improve sync write performance (typical use case for SLOG). You need an SSD that supports PLP, which nearly all consumer SSD’s lack.

NickF1227 · May 2, 2024, 9:18pm

Mirror LOG devices is probably overkill for most circumstances. Sync writes are already inherently redundant.
One of my favorite tech journalists explains what ZFS is doing under the hood.
ZFS sync/async + ZIL/SLOG, explained – JRS Systems: the blog (jrs-s.net)

The SLOG is almost never read from. It’s really for only one reason.

System Crash or Unclean Shutdown: If the system crashes before the data in the SLOG device is flushed to the main storage pool, ZFS will read from the SLOG during the import process to ensure that any uncommitted data is properly recovered and written to the main pool.

If the SLOG (drive hardware itself) has a problem during normal operation, you will get ZFS alerts and you can just detach it from the pool. So, IMO, one properly power-protected device is really the best option. Mirrors aren’t going to hurt you. Just seems like money better spent.

Lobanz · May 2, 2024, 11:44pm

My understanding is that the Optane doesnt have a RAM cache to keep alive and is very fast, so the writes are almost atomic so it doesn’t need PLP as much as regular SSDs. I know the higher end enterprise Optanes have real PLP, but they are also very expensive. But I found a “open box” deal on this Optaine 900p and grabbed it.

Also, my understanding is that if the SLOG device fails, ZFS just reverts to writing the ZIL to the ZIL space reserved on the data vdev.

NickF1227 · May 3, 2024, 12:53am

For home/lab use and things like that, I’ve been buying up and hoarding Optane every few months, the little 118GB and 54GB enterprise-grade ones. These do have “Enhanced” Power Loss Protection
Intel® Optane™ SSD P1600X Series

Intel Optane P1600X 118 GB Solid State Drive M.2 2280 Internal SSD PEK1A118GA01 735858481557 | eBay
I’d probably use one as a SLOG for the right build/use. As for not having PLP, for your homelab its probably fine? But that’s a risk you’re just going to have to be aware of.

That understood, I don’t see the harm in using a 900p for a SLOG. I actually have a 280gb one of those as well, and it’s on my list of things to tinker with.

Of course, insert PSA you should buy a drive with proper PLP, like the P4801X. Which, unfortunately, is still only available from Chinese vendors for reasonable prices.
INTEL 375GB SSD DC P4801X OPTANE M.2 22110 PCIE X4 SSDPEL1K375A01 60DWPD 41PBW | eBay

Even then, it’s sooo (like $1 a Gig vs $1.25 a Gig) close in pricing to P5800Xs on ebay right now, I’m not sure it worth the trouble of waiting 3 weeks to receive.
53M3R DELL Intel Optane P5800X 400GB U.2 NVME PCIE 2.5in SSD SSDPF21Q400GBT New | eBay

TLDR; 900s and 905s are probably better as data drives.

Yes.

Whattteva · May 3, 2024, 1:31am

This is true, but data loss isn’t the reason why I mentioned it. It’s because in order for sync writes to actually be performant, it needs PLP. I didn’t mention it for the data safety, though that is partially why it’s required.

If you tried to use a device that doesn’t have it for sync writes (ie. cheap consumer drives), you will see sync writes CRAWL if you do any heavy sustained writes large enough to exhaust the write cache.

Just search for “slow sync writes” on Google, you can see a lot of people asking this question. Here’s one thread from Proxmox forums where the OP incorrectly thinks “any SSD = fast sync writes”, which he learned is obviously false the hard way.

Rand · May 3, 2024, 6:02am

There are a lot of cheaper offers from China like this one
375GB SSD INTEL P4800X U2 SSDPE21K375GA01 DWPD Solid State Drive Origial New | eBay .

I bought a few from this guy a couple of months back and they all are fine.
Working well, warranty is valid, performance checking out, ESXi didnt complain either (for VSAN), so pretty happy with them.

That said, never had an issue with a 900p as slog either

etorix · May 3, 2024, 8:14am

Great price! But a disassembler near me has 400 GB and 800 GB Dell-rebranded Optane DC P5800X for 399,- E and 599,- E.