When Installing New Drive Pool Disappears

eluate · May 11, 2025, 1:54am

Hello, I have been having an issue with my pool when I try to install a new drive to my server, currently have six drives including my boot drive and I am trying to add a new nvme drive to use as a log drive but when I install the drive and boot up the server my pool completely breaks and is no longer accessible, but when I shut down and take the nvme out it boots up just fine with no issues. Am I doing anything wrong or could this potentially be a hardware limitation?

saspus · May 11, 2025, 3:16am

You did not specify the hardware.

On some main boards m.2 is shared with some sata ports. So you can only use one or the other.

eluate · May 11, 2025, 3:39am

Ahhhh, thank you so much! Sorry I am a bit new to this stuff and didn’t even know this was a thing, but I read through my board manual and found this: “When the M.2_1 Socket 3 is operating in SATA or PCIE mode, SATA6G_5/6 ports will be disabled.”

Protopia · May 11, 2025, 9:38am

Are you sure that you need an SLOG?

What workloads are you running that require synchronous writes?

eluate · May 11, 2025, 12:59pm

Um honestly I am not totally sure if I actually need it, one of my main workloads that I was hoping this would help with is writing a lot of large files to the pool that exceed the ram cache I have available.

swc-phil · May 11, 2025, 2:19pm

“Write cache” is capped at 4GB on systems with at least 16GB ram. Aiui, this “cache” has no use for sync writes.

SLOG only helps with sync writes. You can look at the number of sync writes during “high load” via sudo zpool iostat <your-pool-name> -lv -y 1 30.

eluate · May 11, 2025, 5:23pm

Ah, so I think it would be useful for me, is there any downside to this if I just have a spare small nvme sitting around?

swc-phil · May 11, 2025, 6:03pm

Well, you still didn’t mention your type of workload with sync writes.

AFAIK, SSD without PLP can cause (recent) data loss in case of power failure. And the whole idea of sync writes is to protect (recently) written data from unexpected failure. So, while your sync writes would become faster, they would be less “safe” (in the case of SSD without PLP).
I can be wrong – I’ve never used SLOG myself.

swc-phil · May 11, 2025, 6:24pm

Moreover, if your slog vdev consists of only one drive and that drive fails during sync writes, you would lose your recent writes. Which, again, negates the purpose of the sync writes in the first place.

etorix · May 11, 2025, 6:39pm

You’re wrong on this one: Unless the SLOG fails upon reboot after an unclean shutdown, failure of a single drive SLOG causes NO data loss—writes still come from RAM.
So SLOG can be a single drive. But PLP is a must.

swc-phil · May 11, 2025, 6:52pm

I’m just kinda wrong on this one. Because if the drive fails at the moment of power failure (which is not very unlikely), you will lose recent writes.

Tbh, I wanted to write it in that post but changed my mind.

Protopia · May 11, 2025, 8:48pm

In general terms, you should only set sync=always on zVols (virtual disks, iSCSI) or datasets containing other virtual disks or database files or files which have random access of 4KB or 8KB blocks.

Normal sequential writes should be sync=Standard so that they are asynchronous except for the fsync at the end.

This is because sync writes have a big performance penalty - because in addition to normal transaction group writes sync writes also write to ZIL, which by default is on the same drive as the data. SLOG is a way of reducing (but not eliminating the performance penalty) by redirecting the ZIL writes to a faster device - i.e. SDD if data is on HDD, NVMe or Optane if ZIL is already on SATA SSD.

HOWEVER…

If you are doing these specific workloads that need synchronous writes, then those workloads should also be on mirrors in order to avoid read and write amplification (which has a completely different and separate performance impact on reads as well as writes). And for virtual disks, you will probably be better off using these only for operating system boot, and access all your data via e.g. NFS sequential access which will normally be faster for both reads (benefit from sequential pre–fetch) and writes (asynchronous).

And if the synchronous write data is a reasonable size, you should put it on SSD anyway, avoiding a lot of the need for a separate SLOG.

So, I ask again what your workload is that needs synchronous writes?