I’ve a few questions regarding a SLOG device setup on my NAS and need some clarification. I’ve the following questions:
Is SLOG only required if I’m using VMs and databases?
Is SLOG only required for sync writes?
Does NFS, ISCSI and SMB have support for sync writes?
Does SLOG has to be redundant like the data pool? Does SLOG also requires to be configured in RAID-X to prevent data loss?
For a pool that is all SSD based, is SLOG going to help any?
What size should I use? How do I even calculate what size do I need?
Unlike the L2ARC, is there any minimum RAM requirements for adding a SLOG device?
Can a SLOG device can be added later? Or it can be only created during pool creation just like sVDEVs?
Can a SLOG device fail and the pool continues the operation without any impact? If yes, then what if a data is being read or written to the data pool and the same moment the SLOG device fails?
Most of it is covered by the following articles, etc. SLOG is not a write cache. You only need it for sync writes. So databases, Block storage (iSCSI, zvols for VMs), and NFS. Larger SLOG devices can be underprovisioned and should have PLP (power loss prevention) and proper write endurance. Should be as fast or faster than your pool, in general.
BASICS
Special VDEV (sVDEV) Planning, Sizing, and Considerations
VMs and DBs are definitely two common workloads that demand performance with write-safety, but there are others. Basically, anything where you can’t potentially “replay” the client-side write.
Yes. An SLOG will do nothing for non-sync write workloads.
Sync is enforced at the ZFS level. NFS will default to synchronous write behavior for most workloads, iSCSI and SMB will default to async. You can override/enforce this by setting sync=always on the dataset/zvol.
Redundancy is optional - we’ll talk about this more in #9.
If the SLOG SSD is faster than the SSDs (NVMe vs SATA) then potentially yes.
SSD size is mostly a secondary factor of needing one that is fast (multiple NAND channels) and high-endurance (lots of NAND to spread out the writes) but in general you will only use 16-32GB of your SLOG. But you’ll probably end up with a larger device for the aforementioned performance and endurance reasons.
Technically no, but your amount of allowed ZFS dirty data (pending uncommitted writes) will eat into RAM a bit - so if you have 32GB of “pending writes” allowed to burst in, that’s 32GB of RAM you’ll be using as well.
SLOG can be added or removed dynamically at any time including on a live pool.
Here’s where we get into redundancy, so prepare for paragraphs:
The ZFS write flow uses the SLOG/ZIL as a “second copy” - the writes land into RAM and are copied to the SLOG/ZIL. Unless there is a catastrophic failure (kernel panic, power loss, other crash) the SLOG is never read from. A SLOG failing without a system failure means the pending writes in RAM get committed to disk (as per usual) but you revert back to the in-pool ZIL which has (usually significant) performance implications.
Where you may want redundancy is for the rare-but-not-impossible scenario where you have the system crash and your SLOG device fails at the same time. If this happens, ZFS will attempt to replay the logged transactions from the SLOG device on the next pool import - but if the device is absent or failed, ZFS will throw an error and tell you that it can’t replay to a consistent state. At that point, you can manually override but this means rolling back to the previous transaction group which does mean that the pending data on the SLOG would be lost.