Configuration suggestions for failover VM storage

We have a Dell server with 12x 6TB data drives (not SSD) configured with RAIDZ2 - We have used the storage mostly for NFS mount points (not VM’s, just a backup storage) but want to repurpose the storage to become a failover for our XCP VM storage (using the Continious Replication feature of XCP to replicate VM’s to the storage in the event of disaster). We will be using NFS with sync writes.

I’ve read a couple of articles / forums posts and it appears that our best solution here is:

  1. As the storage itself is not SSD and our primary connection will be NFS with sync writes, we should get SLOG devices along with a ZIL to improve performance - the SLOG device needs to be as fast as possible and considering the Intel Optane NVME
  2. We need to change from RAIDz2 to mirror as this will give better performance

Some questions, if the SLOG is NVME, should I try and get a NVME zil as well? I see the SLOG device should be 16GB over provisioned (Intel Optane has 16GB drives which I assume would be suffice). What size should the ZIL be?

Are there any other recommendations you would make? As mentioned it is for failover environment which we know will be slower than the primary in production VM’s.

Any other suggestions would greatly be appreciated

You’re confusing some things here.
To Quote from the Truenas Documentation:

By default, the short-term ZIL storage exists on the same hard disks as the long-term pool storage at the expense of all data being written to disk twice: once to the short-term ZIL and again across the long-term pool.
Because each disk can only perform one operation at a time, the performance penalty of this duplicated effort can be alleviated by sending the ZIL writes to a separate ZFS intent log or SLOG, or simply log. While using a spinning hard disk as SLOG yields performance benefits by reducing the duplicate writes to the same disks, it is a poor use of a hard drive given the small size but high frequency of the incoming data.

So the NVME ZIL you want to add is the SLOG…

The SLOG basically doed the ZIL work.

Thank you, I am quite confused with the 2, my understanding was that the SLOG acts as a “storage based” cache (which by default is handled by the server memory of which we have 128GB of RAM), so adding a SLOG on SSD would be likely slower than the system RAM but provides a bit of safety if the server crashes / loses power. The ZIL is used to write data to from the SLOG before it is committed to the peristent storage, allowing for NFS to perform faster as it would not have to wait for the persistent storage to first complete the write (as the persistent storage is slower).

Its not a cache - as its always written to and never read from (under a steady state).

Note that the following is true for sync writes:

When an application writes data to TN it then waits for TN to acknowledge the write. TN will only acknowledge the write after the data has been written to permanent storage as well as to RAM. This is the ZIL. Normally the ZIL is on the same disks as the storage which causes a significant slowdown. If you put the ZIL on a SLOG which more performant than the main pool then you decrease the slowdown - potentially significantly, as you are writing to a dedicated vdev that has better performance characteristics. Of course, in a steady state the ZIL is never read from, only written as TN will use the memory copy of the ZIL to actually write data to the final location as part of its usual transaction processing where it batches writes for efficiency.

A good SLOG has the following characteristics:

  1. Faster than the existing pool with less latency
  2. Superb endurance as all sync writes goto the SLOG and its never read from
  3. Power Loss Protection so that data can be maintained in the event of a sudden unexpected power outage which is the only time that a SLOG is read from - to see if any writes have been lost when the power went out

In terms of speed.
Sync writes (No SLOG) are the slowest
Sync writes (SLOG) are an improvement on Sync Writes - but not as good as
Async Writes which acknowledges writes as soon as the data reaches the RAM transaction log, but before it gets written to any permanent storage

However in a sudden power outage / kernal dump etc you can lose writes in memory. If using sync writes then a SLOG saves those writes, to be replayed / reloaded on boot. Perfect for databases and virtual disks

2 Likes

Generally the SLOG is 16GB or 32GB, overprovisioned from a much larger drive.

I think I understand a little better now, I was under the impression I need “persistent ZIL” (so memory > zil (device) > slog > storage) but understand now that the ZIL is always in memory, the performance impact is found from ZIL (without SLOG), so (memory > slog > storage)

So ultimately all we need is 2x SLOG’s (pref NVME, low latency, high endurance with PLP) setup as a mirror.

Last question (and I guess this answer is based on as long as a piece of string) but is there a desired size? Most NVME’s are 480GB+ (which I guess the larger the better), but we also don’t want to buy say a 2TB SLOG which will hardly ever reach 10% of the capacity. We have a 10gb network with 128GB of RAM - As it’s mostly a failover server (so we’re expecting backups to be made to it and worst case scenario become operational at some point) I assume 480GB would be suffice, maybe even overkill?

I read somewhere that a recommendation is 32GB where the rest of the disk is overprovisioned?

Also, I assume that the above is never used for reading over NFS, are there any suggestions on improving the read speed of the persistent storage?

Thank you, so a 480GB NVME would be grossly overkill for our needs? Are there any drawbacks making it 128GB considering the 128GB of RAM?

Not at all. SSD Over-provisioning (OP) - Kingston Technology

You will never need that much space. Go either 16GB or 32GB. iX sells 480GB ones overprovisioned at 16GB.

I allow 20GB for a 10Gb NIC (which is way too much)

Also, unless you are completely paranoid you don’t need a mirrored SLOG. If it fails under normal use then ZFS won’t care, things will just slow down as ZFS will go back to writing the ZIL on the main pool.

It only matters IF you get a sudden outage AND the SLOG fails during the boot process / as a result of the power outage

2 Likes

Thank you all for your invaluable info today, almost made some terrible mistakes with purchases. We’re looking at getting Micron enterprise SSD’s for this purpose (we have limited server storage suppliers in South Africa)

Regarding over provisioning, I have googled a bit to understand what it is I might have to do, from base understanding is I just create a 32gb volume / partition on the disk and by leaving the remainder of the storage empty, the ssd controller will consider this as part of the package? I wont be getting the NVME’s so will not have an opportunity within windows to utilize the micron manager (or whicher brand for that matter) so just want to make sure what my options are here?

TrueNAS CORE will do the SLOG Overprovisioning | for you, while for SCALE you have to follow these steps.

Basically, you tell the system to only use part of the drive: due to how the drive’s firmware works, it will cycle the cells in order to level the wear, thus allowing the drive to last longer.

FWIW, IMHO, the best SLOGs are still Optane drives… now only available used!

P4801x or something, not the tiny small ones which aren’t very fast.

There is a good thread on the old forum comparing many SSDs for SLOG purposes