The ZFS "copies" features is useless: Change my mind

winnielinnie · May 24, 2024, 5:05pm

We start with the premise that I am always correct.

Ergo, I can never be incorrect.

Ergo, any position that opposes mine is de facto incorrect.

Therefor, you cannot refute my claim. It’s futile.

There is a dataset ZFS property called copies.

This is a useless feature.

You can set it to copies=2 or copies=3, which will write two copies or three copies of every block that is saved to the dataset. The vdev doesn’t matter, as this also applies to non-redundant stripes.

Here’s why I find it to be useless:

You’re much more likely to deal with a failing or failed drive, rather than bitrot.
Even if you do face bitrot, any type of redundant vdev can repair the block.
The simplest redundant vdev (two-way mirror) offers the same usable capacity as a single, non-redundant drive using the copies=2 property.
A redundant vdev on its own can recover from a complete drive failure and bitrot. The copies=2 feature only safeguards against bitrot.

Even the “best” argument in favor of copies=2 doesn’t hold much weight:

“You can save money by purchasing only a single drive, and use copies=2 for safeguarding your data.”

Why not purchase two drives, at half capacity, to create a two-way mirror with the same usable capacity, and you’ll also enjoy all the benefits of redundancy to protect against bitrot and drive failures?

Checkmate. I win.

dan · May 24, 2024, 5:09pm

Useless? I’ll bite, and really you’ve already mentioned in passing the case it would help: bitrot, or its close cousin, unreadable sectors. If you have a fairly small amount of data you need to store, such that “two drives at half capacity” doesn’t really save you any money, copies=2 gives you some limited amount of redundancy for that data. It isn’t ideal, to be sure–far from it–but it’s still better than copies=1.

winnielinnie · May 24, 2024, 5:21pm

I see where you’re coming from, but I just can’t see a real world use-case for this that can overcome the fact that a two-way mirror already offers what copies=2 safeguards against, and beyond.

At face value this seems like one of those narrow cases where maybe copies=2 does make sense. But I would argue it only offers “convenience” at this point. Why? Well, let’s have fun with a simple example:

You have a folder of some important documents that total 500 MiB in size. With any filesystem, you can simply make a copy of this folder on the same drive. (With the caveat that you don’t make “reflinks”, which is the default for XFS, as far as I’m aware.)

Right-click → Copy → Paste anywhere else in the drive. Done.

The benefit of ZFS copies=2 is that there exists a dataset where this is done under-the-hood, tucked away from the user for their convenience.

It still doesn’t seem convincing that copies=2 (or heaven forbid copies=3) offers much beyond what we already have available (i.e, redundant vdevs and the ability to make a duplicate of an important folder).

I could take a small HDD, archive it in cold stage, and within an exFAT filesystem, I have an important folder of documents (i.e, “faily small amount of data”) copied over and over multiple times. No need for ZFS, let alone copies=2.

dan · May 24, 2024, 5:24pm

I want to be clear that I’m not arguing that copies=2 is a good idea, but rather against your absolute statement that it’s useless, which I take to mean that it has no value whatsoever. I think has some–a little–value. Not much, and in very limited circumstances, but some.

Stux · May 24, 2024, 5:25pm

It offers error correction on a single device.

The real benefit is copies=2 on a mirror

Now you effectively have a 4 way mirror that can with stand a single total device failure too :-p

Good perhaps for a backup target? Or as we have sometimes recommended when someone has a single bay nas and a single drive.

Ergo, a use. Ergo not useless.

QED

winnielinnie · May 24, 2024, 5:29pm

You can’t ergo my ergo. That’s against the rules.

Ergo, you have forfeited the debate. Ergo, I win.

Ergo to the ergo, @dan is correct in noting my (mis)use of the term “useless”, but that’s probably the lawyer in him.

Stux · May 24, 2024, 5:32pm

I never accepted your premise.

essinghigh · May 24, 2024, 5:42pm

It’s a ‘poor-mans’ implementation of mirroring I suppose (I don’t like the term but it’s apt for the solution), we’ve of course seen people trying to install TrueNAS on laptops, etc (it actually wouldn’t shock me at all if someone tried to run it on a steam deck, now that I’ve thought about it I’m curious enough to buy one and try).

Assuming they’re running a single-disk installation, aka USB-SATA SSD for boot and the internal SSD/M.2 as a single, zero-redundancy pool (god forbid), copies could potentially act as an initial safety-net if it starts throwing out checksum errors. As long as any of the copies read and match the hash it should remain readable until the disk is cloned/replaced/whatever, and then hopefully the person decides that perhaps running TrueNAS on a laptop is not such a good idea after all.

Arwen · May 24, 2024, 7:20pm

Back in 2014/2015 I moved my old laptop, (running Linux), from EXT4 to OpenZFS. It has a single 2.5" SATA drive, (because it’s also a cheap, 11.9" laptop…). For this situation, “copies=2” makes reasonable sense.

Now, I had heard that some SSDs optimize redundant data out, thus making ZFS “copies=2” useless. So instead I partitioned the 1TeraByte SSD further and 2 50GigaByte partitions for the OS and mirrored across them. The remaining space was “/boot”, “swap” or the huge dumping ground.

When I looked to replace that laptop back in 2022, I tried to find one with 2 x internal storage devices. The HP ProBook I found does have 1 x 2.5" bay and 1 x NVMe PCIe bay. So I can do real, separate device mirroring.

Alexey · May 25, 2024, 1:34am

Maybe a misdirected write is the failure mode that copies=2 protects against, but a mirror does not.

etorix · May 25, 2024, 8:05am

Making a backup on a large, single, drive, which will then have protection from bitrot while offline.
Nothing that could be done on a pair of external drives, but having it in a single package may be more convenient.

That, and using ZFS on the single drive of a laptop are admittedly marginal use cases, but the feature is not totally useless.

winnielinnie · May 25, 2024, 1:13pm

BONUS ROUND! BONUS ROUND!

Can anyone find me just one example of an individual or company who would have lost valuable data if not for copies=2 on a single drive?

Because I’m sure everyone here can find examples of redundant vdevs protecting against data loss due to drive failures.

No answers? All I hear are crickets.

Checkmate. Again.

etorix · May 25, 2024, 1:36pm

Arthur? Is this you???

winnielinnie · May 25, 2024, 6:05pm

What is a misdirected write? The drive writes data to a logical address but “misses” the associated physical location?

Alexey · May 25, 2024, 6:44pm

Yes, where you ask the write pipeline to write to the location X, and the data ends up in the location Y, due to an error somewhere along the pipeline. Note that pipeline is not limited to disks, but also includes controllers and whatnot.

Another consideration is that the copies=N feature is used for metadata anyway (including copies=3 for parts deemed more critical). I suppose it was cheap to export this as the user-configurable setting to apply to data too, even if it is not useful in production.

volts · May 25, 2024, 7:15pm

Make an image with copies=3, burn it to a CD/DVD. Lots of additional protection from media scratches and rot.

Does anybody know offhand if the additional copies are eligible for random reads? I can’t imagine how an elevator could predict which would be fastest.

Ancient blog post from Richard Elling including reliability maths:
ZFS, copies, and data protection : Ramblings from Richard's Ranch

Edit: And this from Jim Salter.
testing the resiliency of zfs set copies=n

Arwen · May 26, 2024, 9:31pm

Actually I was looking at exactly that. I have a 25GB M-Disc that I’ve been meaning to burn with lots of stuff. And for security’s sake, encrypt the pool. Even using “copies=3” makes sense.

Of course, for me, barring media, my “lots of stuff” is in the GigaByte range, so I should have bought 4.7GB M-Discs instead.

copies-2 · May 26, 2024, 10:07pm

I think you nailed it. It’s convenient. I’m glad you were able to change your mind.

It’s a useful feature for a small quantity of important documents when you have very real physical disk constraints. You can eliminate the step of copying your data manually to two different locations by creating an “important” dataset with copies=2.

Vollans · May 26, 2024, 10:55pm

Boy do I recognise that! After getting my system up and running over the last few weeks, I spent this weekend copying old archive CDs and DVDs of photos and documents from the 90s and early 2000s to the server for safekeeping - highlight of the CDs had to be one that had got really oxidised and took nearly 4 hours to copy across…

winnielinnie · May 26, 2024, 11:07pm

I still don’t know how that would work in practice.

Do you “send” a snapshot into a file, and then burn that file onto a disc?

zfs send -wLec mypool/mydata@archive > archive.zfs.img

Then just burn that one (large) file onto an M-Disc?

It wouldn’t be easy to access the files within. How would you access the files? Create a ZFS pool, and then “send” the file into the new pool to be received as a dataset / snapshot?

You wouldn’t be able to access the files directly on the disc, no?