Optimal Pool Layout for VMs: Striped Mirrors vs. RAIDZ?

I’m almost finished setting up a new TrueNAS SCALE system, which I’ll use as a SAN for 2x XCP-ng hosts. The server has a mix of HDDs, NVMe drives, and SSDs.

My goal is to create a tiered storage system for my VMs:

Tier Drives Proposed Layout Intended Use Notes / Question
Fastest Tier 2x 1.92TB NVMe Mirror Speed-critical VMs (databases, etc.) Can only support 2x NVMe
Fast Tier 4x 960GB SAS SSD Seeking Advice Larger, performance-critical VMs Stripe of mirrors or RAIDZ1/2?
Bulk Tier 5x 8TB HDDs RAIDZ2 Backups, templates, archives Plan to add a SLOG later.

SSD Pool Layout: For my VM workload on the four SAS SSDs, is it better to create a pool from two striped mirrors (a 2x2 configuration), or a single RAIDZ1/RAIDZ2 vdev?

I heard that striped mirrors are better for IOPS while a RAIDZ is better for sequential writes. Is that accurate?

Here is my hardware spec:

  • Chassis: Supermicro SCE743
  • Backplane: BPN-SAS3-747TQ-N4
  • Motherboard: ASRock X570D4U-2L2T
  • CPU: AMD Ryzen 7 5700X
  • Memory: 4 x 32GB DDR4-3200 ECC UDIMM
  • HBA: Supermicro AOC-S3816L-L16iT_S3808L-L8iT
  • NIC: Intel E810‑XXVDA4
  • Drive Bay: CSE-M28SACB SAS3 8x SSD

For VM’s use mirrors all the way.
You need to add a SLOG to the two SSD tiers, not the bulk tier as you will likley be using sync writes to the SSD’s and Async to the HDD’s

1 Like

Just to confirm, are you recommending that both my NVMe pool and my SAS SSD pool have their own SLOG for best performance? Someone told me that SSDs and NVMe’s were fast enough that performance increase was negligible.

Does each mirror need a slog? I’m out of NVMe space so my slog needs to be an SSD. Could I partition the drive and assign individual partitions as SLOGs for each pool? I also want to be able to add more mirrors to the pool later.

I’m still a bit confused about the difference between a pool and a vdev. It seems like there are two separate options in TrueNAS that look somewhat similar where you can assign log devices.

Why? 1.92 TB SSDs must be DC drives. Ditto for SAS SSD. ZIL on these should be about as fast as possible.

Each pool that has Sync writes to it - could do with a SLOG. Each SLOG is a seperate device without hacking ZFS around.

@user000 do you know the difference between a Sync write and a Async write and how ZFS treats them?

The NVMe pool whilst technically would potentially like a SLOG - as you suggest is most able to do without one due to the speed of the drives overall. Depending on what device you use a SLOG may even have a detrimental effect even if its still doing its job.

The SAS SSD Pool, if you have an Optane (as a SLOG) might like a SLOG but just using another SAS SSD as a SLOG would have limited effect.

The HDD Pool, in your use case does not require a SLOG at all.

Sorry - my initial answer wasn’t specific enough

I think my general understanding was correct. I just did some more reading to confirm. Sync writes the OS confirms that the data is written successfully by waiting for a confirmation message before proceeding. Async writes the OS just waits for the data to be written into memory.

I do have an optane drive, but unfortunately my motherboard is out of PCIe slots. So, I think I might just end up having all my pools be absent a SLOG. I don’t see a point in adding a SLOG that is SSD to an SSD pool.

Thats how the client OS works with Sync writes.

When ZFS handles a sync write it initially writes that to a ZIL, once that is written then ZFS confirms a sucessfull write back to the client OS. The important part is that for Sync writes - this ZIL is on permanent media as well as in memory (if a SLOG is present then the ZIL is on that device and not on the main data vdevs. If a SLOG is not present then the ZIL is a temp location on the main data vdevs). So the requirments for a SLOG are:

  1. PLP protection
  2. As low latency as possible with writes - you don’t really care about reads
  3. Quicker for writes than the pool its attached to - no one cares about reads
  4. High endurance as its rarely read and often written to

In a steady state the SLOG is always written to and not read from as ZFS works from memory. In the event of sudden power loss commited writes that ZFS has confirmed back to the client can end up stored on the SLOG and NOT actually commited to the data vdevs. On system boot the SLOG is read for uncomitted transactions and these are written to the data vdevs correctly.

Thus speedwise Sync < Sync+Slog < Async. And its why Optane (the right models) make ideal SLOG devices due to PLP, Ultra low latency and massive write endurance. Note that not Optane’s are created equally.

PLP: check! Enterprise DC drives.

Requirement #5: SLOG is only useful if it significantly faster than the pool, in latency and throughput.

:+1:

1 Like

L2ARC can help with VMs too.

1 Like

I have 2x Intel Optane P1600X Series 58GB SSDPEK1A058GA but no PCIe slots.

From memory - those are ideal SLOG devices.

That’s why I got this NVME drive, and unfortunately, building my SAN was a learning experience. When I originally bought the motherboard, I thought every PCIe slot was treated equally and that the physical length of the slot was equal to its speed; haha, nope. The only PCIe slot I have left is a PCH slot, which comes off my X570 chip, not the CPU.

ASRock X570D4U-2L2T:
PCIe 4.0 x8 → Intel E810‑XXVDA4
PCIe 4.0 x8 → Supermicro AOC-S3816L-L16IT-O
PCH 4.0 x1 → Possible SLOG w/ PCIe expansion card (no mirror)

Will the PCH lane be fast enough for my SLOG? I won’t have much data traffic across the NICs connected to the PCH, and I’m only using the SATA ports for my OS drives. All my storage drives/traffic are connected to the PCIe 4.0 x8 slots.

Could I get 2x slogs on a single PCIe 4.0 x1 in a mirror? (haha, dreaming)

Make sure that SAS card gets plenty of airflow. Its a supermicro case, but a tower - so I have no idea. I just know that SAS cards need significant airflow

Yup, I got that sorted out with sensors and a bunch of 3d printed parts to add fans.

How about the speed needed for a slog?

I don’t know. You are reducing the PCIe34 down to PCIe 31 which will obviously have an effect. I can only suggest you install one, turn it into a pool and test with fio. It shouldn’t effect latency, but will effect speed.

1 Like

No. Latency may be more important than throughput, but x1 and going through the PCH with its x4 link shared with everything is far from ideal.
The only reasonable use for a x1 slot is for a boot device.

1 Like