Hah, I said to let us disregard any implementation details~
My question was, more directly, would storing a small amount of parity data per file be more ‘useful’, as compared to the copy property? Specifically for the long term longevity of more-or-less static data.
Par2 works by breaking a file into N blocks, then creating M parity blocks for a total of N+M blocks, where M can be as little as 1 and much smaller than N+N. As long as you have at least N total good blocks of either data or parity blocks then you can recreate the complete file.
If space is limited, or you’re restricted by the media (DVD-R, Bluray, M-Disc), and the files are static (long term archive), then something like this makes sense.
Such a tool already exists, which you acknowledge: Par2.
To implement this with ZFS would have to target blocks, not files. It’s theoretically possible. (Ignoring the rewriting of core ZFS code and the added difficulty and system resources required for seemingly little extra benefit.)
You could have some sort of “virtual” raidz. Actually, a really bad way please don’t do this it would suck would be to split your drive into 10 partitions and then make a 10-wide raidz1.
I was thinking about exactly this distinction as I was reading through. That distinction seems in my mind to be a concise way of explaining the difference between how ZFS and someone administering will regard data, and how a “person using storage” to store files thinks about their data. In the abstract we humans tend to care more about files than blocks.
If copies=2 is useless from a data resiliency perspective, does it have value from the perspective that people just really want to protect their data? (I guess another way of putting it would be “is it psychological”?)
@NickF1227 mentioned that metadata is by default written twice. I am sympathetic to the argument that it makes sense that you can optionally write data more than once if metadata is and can be written twice or more.
Personally, I’m using copies=2 on a small number of datasets that store the kind of data I absolutely cannot afford to lose. Those datasets get snapshotted by two different jobs, one of which then replicates to another ZFS pool in another location (not running TrueNAS, for OS heterogeneity), and also get backed up by restic to a destination that is snapshotted regularly. Then I have a manual replication job to copy all the snapshots to a (quality) NVMe SSD housing an ZFS pool, attached via UAS.
Arguably with all of that I don’t need copies=2 (except maybe on the UAS destination), but it just makes me feel better. And it might just save my bacon one day, or I might have minimized my risk so much that I’m just LARPing. With those datasets though, I’d rather never find out.
I will answer to your implimentation comments too, as you got me thinking about, though I still don’t know if it would be worth generating file-based parity data, as opposed to using existing features.
I imagine if I were to do this, I would be using a script to generate/update parity data for files within a directory tree on demand, or on a schedule, where I expect the files to not change much over the scale of years, and store said data within the directories so they get deleted along with the directories.
This I can’t say too much about, as I am not informed about the actual inner workings of zfs, but I imagine such theoretical zfs parity blocks would be the same size as regular zfs data blocks, only there would be some static or proportional number of them per file within the filesystem, or else implimented as a seperate special file with all the parity data blocks inside.
To this day I’m still waiting for one person to say, “I was able to prevent data loss because I used copies=2 on my single-drive pool.”
I’ve seen plenty of examples of people preventing data loss from using mirror or RAIDZ vdevs, such as when they lose an entire drive. (Even though bitrot is much less common than drive failure, such setups also safegaurd against bitrot.)
I came by this thread when researching after a near miss~ Or it seemed like one anyway, the probability was I believe on the order of 1/200 chance of data loss, heh~
If I understand the Par2 request correctly, we have this:
2 way Mirrors prevent data loss if 1 disk fails completely. Further, we can have millions of pieces of bit-rot over both disks, as long as the Mirror copy of each bit-rot is good.
RAID-Z1 protects similar to 2 way Mirror, loss of 1 disk is no problem. And as long as bit-rot only affects 1 column per RAID-Z1 stripe, it is fully recoverable, even if all disks have bit-rot.
Par2 stores some extra recovery information, which varies depending on the user’s selection. Thus, for important files, more recovery information and thus storage, is needed. But, this allows more bit-rot / disk loss to occur without data loss.
In essence, Par2 is a variable stripe width RAID-Zx implementation that does more support for bit-rot and not disk / column loss. Meaning RAID-Zx will recover the amount of failed disks based on Z level, 1-3. But, Par2 parity could fail on a plain stripe of disks because it may not have enough redundancy. For backup media, this is probably fine. Single disk / DVD / Blu-ray / M-Disc…
Mind you, no such behaviour is known for the Protease-Activated Receptor 2
and it is always a misktake to assume that interlocutors understand what you’re talking about if you do not define the context first.
For redundancy at a fraction (x < 1) of the size of the files, there actually is a pool-level solution: Partition the drive(s) and make a raidz out of the partitions, as suggested by @lonjil .
For 4-5 partitions and raidz1 it is an almost reasonable arrangement, and it has the remarkable benefit that it can be implemented by leveraging existing ZFS code.
My current media server is a small all in one PC, kinda designed for embedded use. It is a Fit-PC fitlet-H for those that care. It has a mSATA slot and a 9mm high 2.5" SATA drive slot. It current has installed an 1TB mSATA and 2TB HDD SATA drive, where part of each is ZFS Mirrored for the root pool. The rest is striped for the media.
The media pool occasionally looses a file due to bad sectors / bit-rot. Generally a video file because they are much larger than the music or photos, so statistically more prone to data loss. Not a problem, I have good backups.
So, not a “copies=2” event?
Correct.
However, I had a correctable event on that media pool, which has ZERO redundancy. I thought, HOW???
After a few weeks of thought, (pulling out hair, screaming, etc…), easy. It was “redundant_metadata=all” which uses 2 copies for metadata. Apparently, that pool lost a block in metadata that was easily recovered from it’s 2nd copy.
Maybe not your classical “copies=2” use, but never the less prevented further data loss of the underlying data that this metadata pointed to.
Yes, and with something akin to par2 and M parity blocks stored somewhere, one could recover M file blocks where those blocks have bit rot on both mirror copies. Additionally, though it runs against how zfs works, if drive A and B of a mirror each had a different set of parity blocks it would double overall redundancy, even if individual parity blocks are much more prone to damage. Within par2, duplicates of file or parity blocks are not useful for reconstucting the file, but you can have any number of unique parity blocks. In any case, this sort of resiliancy is what I want to figure out the best way to achieve.
I’ve heard this method of raidz over partitions being referred to as a ‘forbdden raid’ though, and thought the practice was frowned upon? Though this may have been more for a single disk solution, come to think of it. Does TrueNAS support this sort of partitioning arrangement officially? What would be the best arrangement to set it up over multiple disks?
TrueNAS does not support using partitions, only whole drives.
But you can do it in the CLI. If you want some resiliency against bit rot in a single drive, at a lower space cost than copies=2, this is a neat solution. Of course it does not protect against hardware failure, and IOPS are dismal.
I have a legitimate usecase for copies=2
As indicated, it is only useful for bitrot, and there are always “better” ways to do things, but…
I have an old intel NUC that only has space for a single sata 2.5" drive. No nvme, no expansion.
So no way to mirror effectively. But this nuc has legitimately suffered from some bitrot, which zfs identified at scrub. If I had 2 copies of the affected files, it would be corrected automatically.
Of course, copies=2 is just going to burn out the drive even faster, but this old ssd is severly overprovisioned to accomodate reallocation of bad sectors, and it’s not mission critial, just a UPS NUT server really with a backup pihole instance. Not truenas, just proxmox, but still zfs.
ERGO, Copes=2 would definately have been benificial in this very slim use case instead of where I am now: trying to reinstall random libraries to rebuild the bad system files.
Although it is not supported, I ended up manually partitioning and setting up a single drive mirror on my boot ssd. It doesn’t have nearly as bad the iops for writing, as it doesn’t have to seek like an hdd would.
One benefit of mirroring over copies=2 is you can expand the mirror to another drive instead, if your hardware setup ever changes. Probably other benefits too, though downsides are updates possibly breaking the manual partitioning, as it isn’t supported officially, and it does require initial setup this way.