Can I have a 2 TB ZIL that can make use of the full 2 TB as a write cache?

stk · April 8, 2024, 8:22pm

I’m new to ZIL but I learned from a 2 year old YouTube video that the ZIL will only cache 5 seconds worth of data before flushing. So no matter how big the ZIL is, it will only use a small amount.

What I want to do is copy my 2 TB backup image to truenas and have it copy the entire thing to the ZIL ( an SSD with the 2TB ZIL and L2ARC on it) so I can upload my backup to the truenas quickly and walk away while the ZIL writes to the disk in the background.

so my question is can I config ZIL on SSD to be a HUGE write cache to my hard disk pool?

AI said this: “Some users opt for LVM Write Cache or Write-back Cache using underlying volumes. These volumes are then imported into the pool instead of raw disks.” But this appears to be gibberish.

There was a great thread here with the same issue and the answer is just to treat the SSD as tiered storage rather than muck with the ZIL.

I have 10G networking and I want to basically upload a file quickly and commit to the pool as it has time.

The tl;dr of the post is even the RAM cache is two transactions so 10 seconds worth of however fast your pool is (which is not very fast). So 1 TB of RAM would be wasted.

But if I can increase the transaction time from the magic 5 seconds to an arbitrarily large number, I can use RAM and SSD as a cache.

So the bottom line is why is 5 seconds magical? What happens if I increase this value? How large can you make it?

I know all the experts advise NOT to increase the transaction time, but I would LOVE to understand the reasoning for the 5 seconds and what horrible things would happen if I changed it to 6 second vs. 6 minutes. I’m pretty sure my use case (which is modest home use) would be fine with a larger transaction time than 5 seconds.

winnielinnie · April 8, 2024, 8:43pm

The “~~Secondary~~ Seperate ZFS Intent Log” (SLOG), is not really a “write cache” in the traditional sense. It’s simply where “sync” writes are saved, and then discarded once the system confirms the data has been safely written to the non-volatile storage of your pool.

It only comes into play with “sync” writes, *and the only performance benefits are found if the SLOG device is substantially faster than your main storage devices.

If there is no SLOG in play (as is the case with most users), then there’s a dedicated “area” on the pool itself that serves as the “ZFS Intent Log” (ZIL) for “sync” writes.

*To be clear, it is a “perceived” gain in performance. The data will still need to write to the slower HDD pool.

In your case, it sounds like you just want to quickly and securely transfer a large file to your pool?

If so, why not just transfer it using “rsync” (which confirms integrity on-the-fly) to your pool’s storage? (Or if you have a large SSD pool, you can transfer it there first, like a “staging area” of sorts, and then later at your own convenience, copy it to your main pool when time is not as crucial.)

ericloewe · April 8, 2024, 10:30pm

Separate intent LOG, as in “ZIL that’s separate from the main pool”.

People say a lot of things. AI digests it into a mumbo-jumbo paste. It may fool those who’ve never tasted knowledge, but to those who have it’s just a nasty paste.

What you have there is a particularly nasty flavor of paste, because it’s dangerously close to credible while being complete nonsense.

Really, it’s all in the name. It’s a Log, not a Cache. It’s also not clear how realistic your desired scenario is in the abstract (Why would you otherwise be unable to walk away? Is the single SSD really going to be that much faster than the main pool for a streaming write? Is your network fast enough for any of this to matter?)

winnielinnie · April 8, 2024, 10:45pm

I’m sure “secondary” translates into “separate” in some language out there…

stk · April 8, 2024, 11:18pm

I was hoping to avoid the extra step of tiered storage.

Trying to understand why I can’t increase the transaction time. That would affect both RAM and ZIL, presumably, allowing me to fully utilize 100% of the RAM and SSD.

stk · April 8, 2024, 11:26pm

My desire is VERY realistic. I was leaving on vacation. 1 hour before leaving, I decided to backup my laptop to a plug in SSD. Super fast. took 10 minutes.

So then I wanted to take the SSD with me but have a copy on my NAS in case I lose all my bags. So I started the copy which was going at 50M/sec which was way too slow so I had to abort it and was left without a backup.

I didn’t want that to happen again. So I’m upgrading to 10G ethernet, and thought I could use RAM and/or SSD to cache the file so it accepts it quickly and writes to disk after I’ve left.

So now I find out the transaction time is 5 second for ZIL and RAM (the RAM uses two subpools so it can cache up to 10 seconds of sustained disk writes), but that’s limiting.

So I need to either:

Increase the transaction time dramatically (why would that fail in my case)
provision the SSD as a new pool and write to that, then treat as tiered storage.

I’d love to understand why I can’t do #1.

It’s probably covered somewhere… I’m looking now.

winnielinnie · April 8, 2024, 11:41pm

I believe there’s a ZFS variable for that. Not at my computer, so I can’t look it up at the moment.

winnielinnie · April 8, 2024, 11:43pm

This option offers more flexibility.

It’s what I do with an SSD pool.

stk · April 9, 2024, 12:17am

I guess the more general question is why isn’t the ZIL like a FIFO queue? That way the fill rates, and the empty rate can be completely independent and would use up available ram and SSD storage with maximum efficiency. Even if your UPS goes down, on system restart, it should just finish flushing the FIFO. Zero data loss.

ericloewe · April 9, 2024, 12:31am

Failures become much more consequential, for starters. Also, you would need enough DRAM to support an insane SLOG (it’s a log, not a cache), and I’m going to go out on a limb and say you do not have multiple terabytes of DRAM available.
Also also, expect things to get flushed out much more frequently for a variety of reasons. You cannot magically stop everything the system is doing to fill up a giant TXG.
Abusing ZFS’ mechanisms in this way is unlikely to produce a usable system and it’s fairly likely to induce pain.

You’ve also provided zero evidence that you’re even bound by sync writes, which you likely are not, so the whole discussion is based on the misunderstanding of what the ZIL and SLOG are.

Because you can’t hand-wave away the myriad of things involved in storing data. Again, the ZIL is not a cache, but let’s look at ZFS’ write cache. It’s already in DRAM, it’s not going to get faster than that. You can’t just accumulate giant amounts of data and slowly write it to disk because that would break things left and right with async writes and be slow as hell with sync writes (because the data structures are not designed to be accessed in the same way that they are on-disk or in the ARC). Normal users care about continuous performance, not “drop in a large blob and do nothing else for the next hour”.

stk · April 9, 2024, 12:43am

“ You’ve also provided zero evidence that you’re every bound by sync writes, which you likely are not”

SSDs are way faster than HDD. Why would I need evidence HDD write speed is a limit?

I’m trying to write a 1Tb file as fast as possible to Truenas and today I’m limited by the inherent write speed of HDD.

I was simply hoping for a better alternative than to use tiered storage.

winnielinnie · April 9, 2024, 12:53am

Tiered (or “staging”) storage is under-appreciated.

Not only does a “staging” SSD pool allow for that, but you can also make it multifunctional (so as not not “waste” its potential.)

For example: I have a dedicated mirrored SSD pool that serves the following roles:

staging area for large and/or temporary data dumps
incoming torrents
jails (“apps”)
logs
System Dataset

Davvo · April 9, 2024, 1:08am

You don’t have an adeguate understanding of what a SLOG is, please read Some insights into SLOG/ZIL with ZFS on FreeNAS | TrueNAS Community.

There is no write cache in TN, so either you commit and assemble an expensive SSD pool, or you tell us about your file sizes: it might be doable to use SSDs in metadata VDEVs.

stk · April 9, 2024, 5:08am

500Gb to 1 TB file sizes.

What I’m learning is to use SSD as tiered storage. What is the recommended way to accomplish this?

Glusterfs? autotier? …?

ChrisRJ · April 9, 2024, 5:43am

That is your assumption. And while it is a possibility, there may be other reasons in the game. The only “hard” information I could gather from your writings is the transfer speed of 50 MB/s during your backup attempt.

There is a lot of possibilities other than single-drive HDD write speed that could have caused this. But without detailed information about hardware and approach we cannot help you.

Davvo · April 9, 2024, 8:26am

You were getting 50 MB/s with 500G to 1TB file sizes? That’s low, I would expect at least double that… what is your dataset’s recordsize? For such files it should be at least 1M.
How were you connecting the external drive?
How full is the pool?

etorix · April 9, 2024, 9:44am

To begin with, you have not provided any evidence that your workload uses sync writes.
A SLOG is ONLY used for sync writes. If your workload is asynchronous, as should be if you’re seeking performance, not a single byte will ever be logged to a SLOG.

Now if your HDD pool, whose geometry has not been disclosed either, can only sustain writes at 50 MB/s you have a more fundamental issue—and a SLOG will not solve it.

stk · April 9, 2024, 2:28pm

Fair enough. My 400G file was on an SSD plugged into USB 3.0 port on a PC with 1Gb ethernet. Truenas box on 1 GB ethernet as well. Mounted SMB share from Truenas on the PC. Did a windows copy of the file on the nvme SSD to the truenas SMB mount of a mirrored volume with seagate CMR disks (12 TB disks). Was getting around 40MB/sec throughput which is 2X slower than I presumably should see. Not sure why this is so slow but I also tried copying from the nvme to a LOCAL USB 3.0 hard drive plugged into the same PC and was getting similar throughput.

So if I beef up the network and write to an SSD dataset on the truenas, I’m sure I’ll do better. The truenas has 64 GB of RAM.

And I agree, ZIL would never be used in this case since these are all async writes.

Davvo · April 9, 2024, 3:13pm

Try from the PC to the NAS, you might see different results.

ChrisRJ · April 9, 2024, 4:28pm

Then it seems that here lies your problem, not with TrueNAS or the network.