Single drive ZFS pool for cold storage? (With offline backups)

I have a very simple idea here and i just want to go over if this will work before i implement it.

I have a couple of terabytes of archive data and it grows for about 500GB a year.

Currently this is all on a rather new 18TB toshiba and another two 3TB toshibas for backup, all in NTFS format.

I don”t need speed, i don”t need auto-repairing with a raid field.

But i want checksum/data integrity checks and bitrot protection.

So this is the idea:

I build a small home server with an SSD that runs TrueNAS (suggest me which version for my needs please).

That server only contains the 18TB toshiba formatted in ZFS.

The archive data is dumped to it. When i need to acess that data from my macbook i turn the server on and if i mounted the pool well it will be network acessable.

Now:

The cold backup is in an icybox external case in the NTFS format and the two 3TB toshibas go to the basement (they will be the third backup for the most important stuff).

Once every couple of months i run a scrub test on the single toshiba in the small server just to check if there are bitrot errors.

If there are, i check the data manually and recover it from the external NTFS backup drive.

Will this work okay? I really really don”t need the auto recovery options, i just want the warning about bitrot or corruption so that i can recover manually.

Also, if i”m working this way, do i eliminate the dangers of not having ECC memory considering that if things go south my backups are offline anyway?

Is there a possibility that ZFS screws up the data on the main drive because i don”t have ECC memory without letting me know about that and then i acidentally overwrite the backups with the corrupted data?

Thanks!

1 Like

Just to add something i didn”t explain right…

My reasoning behing not needing ECC ram is that if i block the ZFS system from auto healing and only use it to warn be about bitrot problems… and i do the damaged file restoration manually… then i”m not at such a risk of having corrupted data from malufunctioning ram written all around?

1 Like

TN is prolly not the solution you need. If all you really want is ZFS file system, just build a basic Debian system with everything on the 18TB or with a bootable USB or tiny SSD. You use all of the ZFS goodness and not have a NAS appliance to manage. You can use TN, but I think you are adding over head. However, this is far from how I would architect a solution. I would use TN with at last a mirror (2 18TB drives) or a RAIDZ1 with 3 smaller drives and use the 18TB for backup. Add backblaze and/Truecloud backup and you have a great solution with multilevel protection.

My miniature media server does not have redundancy for it’s media pool. Neither do the multiple backup disks. All are ZFS and scrubbed as possible, (backup disks before use and the media pool twice a month).

One thing that I noticed with my online media pool, without regular scrubs, I regularly got bit rot corruption. Perhaps 4 times a year. Easy enough to fix the corrupt file(s) from one of my backups.

But, the moment I enabled twice a month ZFS scrubs on that media pool, their was ZERO bit rot detected. In years! Seems regular reading of the media files allowed the 2 disks, (a 1TB mSATA SSD and a 2TB HDD), to find weak sectors before they failed completely. Then, the disks applied internal error correction and “fixed” the problem BEFORE ZFS detected a problem.

What I am saying, is that it is possible that powering on this archive server every 2 months or so may not be often enough to prevent that type of bit rot. Or maybe the fact that the 18TB disk is powered down will be good enough and prevent the heads from weakening the magnetic fields.


I do agree with @Theo that TrueNAS does seem like overkill for a simple archive used just a few times a year.

4 Likes

I figured that this is also an option but when i googled it i realized that both debian and ubuntu don”t support ZFS out of the box and i should install it on top of the OS… This is not a problem for me, it”s just… i”m not 20 anymore and i want a solution that is as “set and forget” as possible and this linux setups without native support for stuff remind me of unstable systems than have taken more time from me in the past then gave me back in functional use…

While trueNAS seems like an out of the box solution that is easily expandable in case i really like it, decide to build a server with ECC memory in a year or two and add more disk to a raid field…

So on one hand it seems like a good tool that i would like to learn something about. That”s why i taught i would go with it.

2 Likes

You made a good point here. Exactly what i was looking for. This means that my solution will work but i will probably cause additional work for myself instead of having a simple raid1 field that will heal itself and just copy that to the cold backup.

The problem is that if i let it auto scrub i”m getting scared of using non ECC ram because if i”m in manual mode and the scrub reports errors and does nothing… i can go check the files and fix them myself. If the system is wrong, i”ll notice.

While with auto scrubs i”m scared that the ZFS system might corrupt things without me knowing about that.

On the other hand there are used lenovo thinkstations with ECC ram for a couple hundred euros around… It doesn”t matter if it”s an older system with a 4 core xeon and like 8 gigs of ram, that”s enough for my needs?

With that amount of data you could set copies=2 on the 18 TB Toshiba and have redundancy and bitrot protection with a single drive—at the cost of half the capacity.

16 GB RAM would be better than 8 GB but the general answer is “yes, it should”.

4 Likes

The copies = 2 option is also fine but my logic of not having TrueNAS auto repair itself was that i can get a non-ecc lenovo SFF PC for a 100E and be okay with that because i”m not scared it will corrupt something with bad ram considering it”s not writing anything, just warning me and then i fix the corruptions manually.

If i would go with a 500E option (used lenovo P520 with a xeon and ECC) then i would also invest in two hard drives for mirroring and call it a day.

Bit flips in RAM and bitrot on drive are two different mechanisms for data corruption, independent of each other. So your logic is somewhat flawed: Skipping ECC will not make drive storage safer.

2 Likes

I know, but if zfs auto repair is not turned on then it”s not passing so much hard drive data trough the ram automatically without me knowing it. So i didn”t say skipping ECC will make my storage safer. I said that skipping auto healing if i don”t have ECC, will make my storage safer.

That’s still a wrong calculation since you’re accounting for a rare RAM event to occur exactly when and where it shouldn’t with respect to a rare HDD event. The probability for this scenario is infinitesimal.

I agree.

I would think that the bigger the array is and the more active it is… this possibility becomes larger. But for manual scrubs just for warnings it should be minimal.

I”ll probalby go with a simple SFF PC and one drive with external backups for now and then if i like the system i”ll invest in a decent workstation with ECC and mirroring.

I have 20TB stored on my 70TB array with 96GB of non-ECC memory, the chances of an ECC issue with ZFS is VERY rare and ZFS is better than other file systems in error correction. My focus is on many copies with RAIDZ2 arrays. A random bitflip is 9th of my list of things I care about and bit rot is 20th, since my arrays are hot.

One thing to consider, if the Non-ECC RAM server is powered off for 2 months or so, then perhaps running a few passes of RAM check should be done before booting. Then, after boot, run the ZFS scrub, (in non-redundant mode, it won’t heal data), to check the existing data. THEN, add the new archival data.


Please note that ZFS has some unusual special features. While “copies=2” makes 2 copies of DATA, it also increases all MetaData by 1 copy too, (up to 3).

Standard MetaData is things like directory entries, which by default under “copies=1” still has 2 copies. To reiterate, METADATA, both standard AND critical have more than 1 copy by default. This allows self-healing of METADATA even on non-redundant pools.

I believe I have actually seen this in action on my miniature media server’s non-redundant media pool. If I remember correctly, I saw a failure that did not list a file to repair, so no known permanent errors. Yet it had an error it healed. Thus my assumption it was Metadata.

The redundancy is controlled by:

copies=1/2/3
redundant_metadata=all|most|some|none

And the defaults are:

copies=1
redundant_metadata=all

For reference, the original ZFS designers thought that MetaData was more important that Data, because loss of a directory entry, (or other more critical MetaData), could result in the loss of an entire file. Or multiple files. Thus, all critical MetaData has 3 copies by default and standard MetaData has 2 copies by default.

It was also thought by those original ZFS developers that the amount of MetaData would still be a fraction of storage compared to real Data.

Edit 2026/01/29:
I did forget the unusual use case of Internet News or messages. Those could be small, even tiny files, and number in the 10s or 100s of thousands. Even millions. In that use case, of a massive amount of tiny files, the ZFS MetaData storage requirement could easily be equal to regular data storage requirement. Or MetaData could easily be double or triple the Data storage requirement.

No file system is perfect… Remember, it could even be worse, since ZFS can perform in-line compression, such that files that would normally take more space than MetaData, could now be smaller than the amount of MetaData.

3 Likes

Let me also make a plug for offsite backup.

My suggestion: it’s great to have backups, but please consider adding an offsite backup. It can be a single drive inside a simple case, but get it out of your house / apartment.

Store it at work, in a safe deposit box, at a friends house / wherever, but prevent a single event at your home from wiping out your entire digital footprint.

3 Likes

Will be done, at least for the most important 2TB of data.

I just have to find out how to maintain the “very cold” offsite backups.

If i have a ZFS based server at home, can the offsite backup drives be NTFS and i just compare them for corruption with the main drives? Or would you suggest ZFS for them too?

I would check up on those like every 2 years or so.

In an ideal world, it would be ZFS because it’s the only file system that has checksums for everything - files, metadata, etc. that can then be occasionally scrubbed - even with really long intervals - and any issues detected / resolved.

At the same time, I also see the benefit to have the off-site files in a format that a non-ZFS machine “understands”. It’s less things to install, fewer hurdles between you and being able to use the data.

The other reason is that it’s not as easy to have a variety of data on the backup drives (ie archives, more current stuff, things you no longer want to store on the server, etc) when they pretty much are dedicated to a ZFS server if you replicate to the drive. Everything else is wiped.

For my use case, “sneakernet” offsite ZFS is still better, but the use case really matters.

1 Like

YOU need ECC on the origin of your data.

ECC protects you from data corruption in RAM, written out to HDD.

If such issue happens, the already corrupted data will be written to HDD and all checksums will be calculated with the corrupted data, and ZFS will handle the bad data as correct.

Based on this, I would recommend to go the other way around:

  • Build a small, ECC RAM based system with 2x18TB HDDs mirrored and TN and use it as the original/main location for the data. Save all new data to this location in the first place.
  • Access the data through network share
  • Use your local 1x18TB HDD in the WIndows system as backup instead
  • If you still have free resources, you can buy an old office PC (anything from the last 10-15 years is more than enough for this, if you are lucky, you can get it even free) install another 18TB drive, TN and put it aside with lets say wake on LAN and run the backup job to it. LIke once every two weeks or a month. And this can also be your offsite backup too
  • All backups should be pulled or pushed from the ECC sytem. THis is , how you can minimise the chance of data corruption because of lack of ECC .
2 Likes

My first NAS was running on Ubuntu server and ZFS for Linux on top of it.

It worked well for long years,

I successfully installed TrueNAS yesterday and inserted some old 500GB drives into it so that i can first start with some testing. My first impressions are great! Thanks for the advices, i”ll implement them.