Questions from someone starting out in the world of servers

GregHouse · April 10, 2025, 6:59pm

So, putting the dedup table on L2ARC on an SSD, the dedup table will also continue on the data HDDs.
Putting the dedup table on a Special vDev, for example, four SSDs in RAIDZ2, will the dedup table continue on the data HDDs?

Arwen · April 11, 2025, 3:15am

Yes, because L2ARC is not redundant.

No, putting the De-Dup table on a Special vDev will prevent it from being in the data HDDs.

Further, it is my understanding, you can NOT have Special vDevs in RAID-Zx anything. Or dRAID. Special vDevs are limited to Mirrors only, (or single disk Stripe, which is of course not recommended). You CAN have multiple Special vDevs, potentially for different purposes.

One last comment. Special vDevs are not the end all perfect solution. When they get full, writes will then continue on regular data vDevs. This is better than the old *nix file systems that had limited I-Node counts, which had to be tuned at file system creation. Thus, a file system intended for tiny files must be created to support such, because the default was not good enough. ZFS does not have that limitation.

I may be wrong on some of the details, so again, lots of research is needed.

GregHouse · April 11, 2025, 7:15pm

How much space does dedup table need? What is the size of the dedup table?

Arwen · April 11, 2025, 9:30pm

Sorry, I don’t know individual entry size.

But, the total De-Dup table size is directly related to the amount of entries it has. So dataset(s) with De-Dup enabled, and have few unique blocks, but lots of common blocks that are Du-Dupped, would have a small De-Dup table. (Unique blocks might have De-Dup table entries so that future writes can check if De-Dup is possible with that block. I don’t know that level of detail…)

Remember, ZFS De-Dup works on blocks, not files. A ZFS dataset with a block size of 64KBs, and a 256KB file, would have 4 chances for parts to be De-Dupped. But, that could also need 4 De-Dup table entries if all 4 x 64KB ZFS blocks were De-Dupped.

Yet, using dataset(s) with 1MB block size and a 1MB file with a single bit different from dozens of other 1MB files, would not be De-Duppable.

It is a trade off. Smaller dataset block sizes allow for a higher chance of De-Dupping. BUT, that then could require more, potential a lot more, De-Dup table entries.

GregHouse · April 11, 2025, 9:40pm

Not related to dedup:
Some say that there is a rule that for every 1TB of data the ideal would be to have 1GB of RAM.
If ZFS is using all the RAM that the motherboard supports, is it possible to use an SSD as RAM, creating a “virtual memory/pagefile/swap” or something like that?
Would this be the L2ARC?

Arwen · April 11, 2025, 10:34pm

That is not really a true “rule”. It was a firm suggestion in the past, but for SOHO, (basically most people here in the forums), we routinely discount that suggestion without real issues.

Of course, more RAM is better, but does not always help in all cases.

Again no. Their is no “virtual memory/pagefile/swap” in TrueNAS SCALE, and L2ARC is not for that purpose.

L2ARC is a secondary cache for the ARC, (Adaptive Replacement Cache). See this;

GregHouse · April 11, 2025, 10:43pm

So if I have a motherboard with 128GB of RAM (the maximum it supports), will it be complicated to have 200TB of data?

Arwen · April 11, 2025, 11:08pm

Nope.

The real problems are un-related to RAM, well mostly. If you value your data, ECC RAM is desirable, (which you have listed).

For 200TB of data, you run across:

Backups - ZFS Mirror / RAID-Zx redundancy is not backups
Time it takes to get a new disk for a failed disk, which may impact how long you have reduced / no redundancy
Length of time to re-silver a failed disk, sometimes days, others have had problems taking weeks
RAID-Zx vDevs should not be too wide, aka made up of more than 12 disks. After that, you add a new vDev.

The exact width of a RAID-Zx vDev is debatable. Some say they use 15, even 18 disks successfully. Others notice a slow down over time because data became fragmented AND has to be read from the entire stripe, (minus parities). Then find copying data off is a nightmare because the pool is now too slow.

Some of the problems can be worked around:

Because TrueNAS is a NAS, Network Attached Storage, many of the backup methods are network based. So using 10Gbit/ps Ethernet or higher can be helpful. However, some people want local backup methods…
Spare disk on site, (aka cold spare), already burned in & tested
Using 3/4 way Mirrors or RAID-Z2/3. Using large disks, like >=2TB, RAID-Z1 is not recommended. Eventually the large disk size will apply to 2 way Mirrors, just don’t know if it is 10TB or 20TB. Has to do with disk read error rates…
Having a spare disk slot in the server can allow replacing a failing, but not completely failed disk, using “replace in place”.

The “replace in place” was not a common feature when ZFS came out. Basically a user adds the new disk to the server and tells ZFS to replace the failing disk. Any good data is taken from the failing disk, and gets copied to the new disk. Any bad data is taken from redundancy, (Mirrors, RAID-Zx, dRAID, copies=2/3). When done, the failing disk is removed from the vDev / pool. This attempts to keep as much redundancy as possible during disk replacement.

GregHouse · April 12, 2025, 12:23am

So I think this will be the last question:

In normal data pool in RAIDZ1, for example:
Three data HDDs + 1 HDD for redundancy/parity.
When I add a new HDD to the pool, can I choose whether it will be a data HDD or a redundancy/parity HDD?

In case of a Special vDev, can I add more HDDs to the MIRROR?

winnielinnie · April 12, 2025, 12:33am

That’s what Unraid does.

ZFS does not have “parity drives”.

You can either add to the pool a new vdev of multiple drives, such as another RAIDZ1 vdev, or “expand” the existing RAIDZ1 vdev with an additional drive. If you do the latter, then only newly written data will take advantage of the improved storage efficiency.

Arwen · April 12, 2025, 1:47am

Unlike traditional RAID-5/6, ZFS’ RAID-Zx uses only as many columns as needed. So for a large file on a 4 disk RAID-Z1, ZFS will write as many 3 column data strips as needed, each with it’s own parity column.

However, when dealing with a file smaller than the Dataset’s block / recordsize, for a 4 disk RAID-Z1, ZFS will write a single data column and 1 column for parity. This also applies to the end of a large file, if it does not file up a 3 column data stripe.

Basically, there is no fixed column width in ZFS’ RAID-Zx.

Note that you can not increase RAID-Zx parity level after the fact. If you choose RAID-Z1 at vDev creation, then decide you want more security offered by RAID-Z2, it’s backup, destroy, create and restore time.

Yes, but you would not generally use Hard Disk Drives for a Special vDev. Even SATA SSDs would be better than HDDs. With of course NVMe being the top tier at present.

It is also possible to change a 2 way Mirror to a 3 way Mirror, at any time. And add another Special vDev if / when the existing one(s) become full.

However, if the pool has any RAID-Zx or dRAID vDevs, you can not remove Special vDevs.

GregHouse · April 12, 2025, 2:35am

I will do some tests on TrueNAS and ZFS using a virtual machine.
Learn how it works.

GregHouse · April 23, 2025, 4:12pm

Hi. Me again.
An example: if I have two data pools and one SSD.
Can I divide this SSD into two parts/partitions and use each part for a special vdev (cache, dedup…) for each pool?
Or these special vdev is it only possible to use the whole disk for one pool?

Jorsher · April 23, 2025, 4:35pm

@Arwen nice patience.

@GregHouse Sure, you could probably split a single SSD into two partitions and use it with two separate pools, but you would definitely want a mirror. If you split a single SSD into two partitions to use as a dedup table for two pools, and you don’t have a mirror of that drive, then losing that single SSD means you lose BOTH of your data pools (all data). I’ve seen mention of people doing this with Optane (mirrored, very durable, low latency), but I don’t think it’s worth the trouble.

Why do you think you would benefit from a dedup table? Which record size are you going to use? What is the type of content you plan to store?

Why do you think you would benefit from L2ARC cache? What are your ARC hit statistics? How many simultaneous users will be using this storage? What is your use-case that will repeatedly query more data than your RAM can store, and need it as quickly as possible?

Not to deter you from seeking more information (there are a lot of great articles on the internet to help you learn about these features), but I’m betting you do not need any of the special vdevs. Keep it simple.

To your RAM question, I had 700TB of data on a system with 64GB of RAM and 0 problems. I only upgraded because I needed more PCIe and more RAM (for services). The old rule-of-thumb is really not that relevant in my opinion.

GregHouse · April 23, 2025, 5:04pm

OK. Thanks for the explanations.

GregHouse · April 23, 2025, 8:00pm

Not wanting to bother, but already bothering.

I have three pools: A, B and C.
Pools A and B are for data.
I want to use pool C for, for example, L2ARC cache.
How can I divide/partition pool C so that one half is used by pool A and the other half by pool B?

Jorsher · April 23, 2025, 8:05pm

Thought for 43 seconds

ZFS doesn’t let you “slice” a pool itself into two separate cache-areas—L2ARC (and SLOG) devices have to be whole block devices (or files/volumes), not parts of a live pool. You have basically two ways to carve up “Pool C” so that half of its capacity ends up as cache for Pool A and half for Pool B:

1) Partition the underlying disks

If Pool C lives on raw disks (or NVMe drives), you can:

Destroy Pool C (backup your data first!).
Re-partition each disk into two equal partitions, e.g.

bash

Copy

/dev/sdc → /dev/sdc1 (50%) + /dev/sdc2 (50%)  
/dev/sdd → /dev/sdd1 (50%) + /dev/sdd2 (50%)  
…

Recreate Pool C on whichever partitions you still want for storage.
Use the other partitions as cache devices:

bash

Copy

# half of each disk goes to Pool A’s L2ARC
zpool add A cache /dev/disk/by-id/…-part1 /dev/disk/by-id/…-part1

# the other halves go to Pool B’s L2ARC
zpool add B cache /dev/disk/by-id/…-part2 /dev/disk/by-id/…-part2

Each pool now has its own dedicated L2ARC devices carved from the same physical drives.

2) Create ZVOLs (or cache files) inside Pool C

If you’d rather not re-partition hardware, you can carve out block volumes or files inside Pool C and present them as cache devices:

Make two ZVOLs (each half the size of Pool C’s SSD vdevs):

bash

Copy

#  suppose Pool C has 1 TB of SSD capacity
zfs create -V 500G poolC/cacheA  
zfs create -V 500G poolC/cacheB

Add them as L2ARC to your data pools:

bash

Copy

zpool add A cache /dev/zvol/poolC/cacheA  
zpool add B cache /dev/zvol/poolC/cacheB

(Or, if you really prefer files, you can dd out two 500 GB files and do zpool add A cache /mnt/C/cacheA.img—ZFS will happily use files as cache Server Fault.)

Caveats

Performance isolation
Both methods share I/O paths with Pool C. Partitions on the same disk still contend for the same NAND/channel, and ZVOL-based L2ARC goes over the ZFS I/O pipeline. Dedicated raw partitions will give the most predictable latency.
Destruction required (for the partition approach)
You must tear down Pool C and repartition the disks—back up all data first.
No dynamic “slice‐off”
There is no way to carve out half of a live ZFS pool on-disk without recreating it or using ZVOLs/files.

Bottom line: pick whichever trade-off you prefer—raw partitions for the cleanest performance or ZVOLs/files for flexible, in-pool slicing—but you cannot simply “split” Pool C without one of these lower-level tricks.

Signed,
JorshGPT

Jorsher · April 23, 2025, 8:07pm

By the way, I don’t recommend doing either of the above, but I don’t recommend bothering with L2ARC cache either.

I just didn’t want to put more effort into this than you do.

Why do you think you would benefit from L2ARC cache? What are your ARC hit statistics? How many simultaneous users will be using this storage? What is your use-case that will repeatedly query more data than your RAM can store, and need it as quickly as possible?

If you’re really sure you need this (you don’t), you’d partition the disk into two. Create the l2arc vdev for each pool, and select one of the two partitions.

GregHouse · April 23, 2025, 9:44pm

Thanks again for your help and your patience.

Jorsher · April 23, 2025, 9:59pm

I really recommend starting with a simple raidz or mirrored pool and seeing if you actually encounter any situations that would benefit from the special vdevs. For home use, you probably won’t.

A lot of people that are new to ZFS / TrueNAS want a cache but don’t understand how it functions and articulate why they think it’ll benefit them. If you’re using your NAS for documents, photos, and videos (like most home users), L2ARC is highly unlikely to give you any benefit and dedup is virtually useless.

Here’s some more good reading: