ZFS pools & power loss - you won't lose data

There is sometimes misconceptions about ZFS losing data due to power loss. The subject of power loss, (or OS crash), and then not being able to import a ZFS pool comes up here in the forums a few times a month. Here is the skinny on what should happen.

There are several things to un-pack about power loss affecting ZFS and data loss.

  1. Any previously stored data in a ZFS pool can not be / is not lost on power loss. Exception is in hardware failures that affect pool redundancy. Like the loss of both disks in a 2 way Mirror vDev.
  2. Any data that has not yet been written / still in flight, is lost. Just like every other file system out there.
  3. Using SLOG, (Separate intent LOG), is a specialized case good only for synchronous writes. Enterprise’s might Mirror SLOGs for “just in case” of a SLOG device failure during power loss or OS crash.
  4. ZFS attempts to be always consistent on disk, thus, after crash / power loss, no file system check is needed. I say attempt because consumer hardware fails more often than Enterprise and that can lead to data loss.

This specific power loss issue was a design criteria of ZFS, no data loss on power loss. Given no hardware failures, (RAM, storage controller, storage device, etc…), and no bugs in ZFS, (rare, but has happened), there is zero chance of ZFS losing data on crash or power loss. Except of course data in flight. Data is either completely written or not.

On the other hand, we have had enough indications on why pool corruption has occurred:

  • Use of hardware RAID disk controllers, (yes, even JBOD mode can be bad).
  • Using a proper LSI HBA, but really old firmware has led to problems, some appearing to cause data loss.
  • Use of USB attached drives, (some with hardware RAID controllers set to JBOD mode).
  • Virtualization of TrueNAS but not passing through the disk controller, (just a virtual disk or the plain disk).

Any of these can lead to out of order disk writes that during a crash can cause ZFS pool corruption. Since the corruption only occurs due to a crash, (power loss or OS crash), people say “But it worked for months / years!”. That is the point. ZFS was designed to handle crashes IF you give it the proper hardware. If not, you may get lucky or not.

Additionally, virtualization of TrueNAS with Proxmox, but not blacklisting the HBA / disk controller can corrupt a pool. Without blacklisting, Proxmox may import the pool, at the same time as TrueNAS. That is very bad as ZFS was not designed as a shared file system.

On the subject of UPSes, while a good thing, even if it just handles a few minutes, is not strictly necessary.

  • As a minimum, it will extend the life of your hardware through fewer surges, power dips, transient outages, etc.
  • Some hardware will fail or lose data because of graceless power loss.
  • When the cleaning lady comes through and plugs the vacuum into the wrong circuit, it doesn’t interrupt your workflow …

Addendum (2025/10/11):
To clarify out of order writes. Many hardware RAID controllers will have battery backed up RAM cache used for writes. They will also optimize writes by doing elevator seeks. Meaning they may write data out of order the same way an elevator will stop at floors based on proximity, not the order the people got into the elevator.

For example, 3 people get on the elevator at the ground floor. First wants the 10th floor, the second wants the 5th floor and the 3rd wants the 3rd floor. The elevator stops at 3, 5 and 10 in that order. But, for ZFS, it would want the writes to be 10, 5 and last 3 for data integrity. The last write activating the prior writes.

If a power loss lasts longer than the battery backed up RAM cache can survive on a hardware RAID card, then the remaining writes are lost. Which could have been critical for the integrity of the pool when dealing with out of order writes.


Addendum (2025/12/10):
Added the note about Proxmox virtualization, and black listing the HBA / disk controller.


Comments?
Corrections?
Additions?

5 Likes

Looks right, other than some typos:

  • s,loose,lose,g;
  • s,their,there;

I would also add that a UPS, while not strictly necessary, is a Really Good Thing for a file server even if it only has a few minutes’ of run time. As a minimum, it’ll extend the life of your hardware through fewer surges, power dips, transient outages, etc. Plus when the cleaning lady comes through and plugs the vacuum into the wrong circuit, it doesn’t interrupt your workflow …

But that’s not specific to ZFS, no.

1 Like

Done.

I also still recommend a UPS. Who wants any data loss ideally? Power Losses can definitely affect hardware, especially brownouts. People are always worried about power surges. Brownouts can kill too! And once you affect hardware, you can affect Truenas.

I agree, the way zfs works, it should never cause a corruption or lost pool just because of a power outage. Has it? Yes, there’s been a bug here and there. Pretty much, it’s solid.

I would add another to the list. There are many people using good HBAs, but ancient firmware too!

I’ve added the HBA firmware issue, and the UPS. Give those a read and let me know if something could be improved.

1 Like

Looks good for the most part, I do still think a UPS is required if you want your machine to keep working. Low power, and I’ve had low power for 10 minutes before, can fry many electronics in your house (and it has). If it’s shorter, it can weaken components and make it seem like the “power failure” caused it. Now very strange things can happen with stuff perhaps not running to spec any more. I used to be in that business and low power situations cause so many issues.

My specific fear is the weakening of components as now it’s very hard to troubleshoot (if it doesn’t outright fry them). Maybe it doesn’t cause immediate loss but system gradually starts having various unknown issues.

Anyway, since the topic is losing a pool, perhaps your writeup is good. Maybe I am expanding beyond lost pools. I just think a UPS is absolutely necessary. It is for anything I run! But I’ve lost half a house of stuff from brownouts before, lol.

Perhaps creating a separate UPS resource, explaining the various issues, and possibly referencing this Resource for the ZFS side.

1 Like

Thanks for writing this up for everyone to use. The community needs more of course. I am incapable at this time of writing things up, not my skill (though I have done a couple) and have other problems. People like you, Stux, etc. help move stuff along for folks trying to understand more. I try and pitch in comments where I can, but time is limited.

Now if someone had the time to write up a noob Plex install guide, the forum would have a lot less posts! :joy:

1 Like

Looks good.

A separate UPS would be good too.

I have always used server hardware with redundant supplies and use dual UPS systems on the server rack which includes the switches, router and other equipment. Each server is thus supplied by both UPS systems and the rest of the equipment is split between each UPS. I never have to worry about any data loss during a brownout or other power issue or waiting for a rack reboot after a power failure and the generator kicks in.

1 Like

One other problem that I thought of, storage device write cache. ZFS was originally written when HDDs had write caches that could re-order writes, (aka elevator seek). On power loss, (or OS crash), that could cause data loss.

At one point, (around 2010), Sun Microsystems recommended disabling HDD write caches when using ZFS because of this problem.

Eventually this problem was overcome by implementing and using write barriers. This meant that all the preliminary writes for a ZFS transaction, (aka data write), could be known to be flushed to media, (HDD sectors, SSD / NVMe flash memory), before the final write that activated the prior writes. This meant that ZFS was still always consistent on stable media and would not lose data on power loss or OS crash.


However, USB controllers, (with or without hardware RAID controller), could still implement elevator seeking with their write cache. And it is possible that low end flash storage devices, (whether that’s USB, SATA or NVMe), may not implement write cache correctly. Expecting low cost / low end storage devices to be perfect would be asking too much.

So, caveat emptor, (buyer beware).

4 Likes

A very interesting topic — nicely put together!
Like so many things in storage, it’s a pretty deep rabbit hole…

ZFS will always do its best to keep data consistent, but that also depends on the hardware being honest and cooperative.

From what I’ve observed, most HDDs have a write cache sized roughly to hold about one second of sequential write data.
When ZFS sends a write command and the device responds “OK”, ZFS assumes the data is safely on disk.
Practically all HDDs report a “write OK” status as soon as the data has reached their internal cache — that’s the whole point of having a write cache.
This behavior is completely normal.
The real problem starts when drives “lie” — that is, when they also report “write OK” even after receiving a flush command while the data is still only in volatile cache and hasn’t actually been written to the platters yet.

This so-called “Write Cache Lie” was indeed an issue in the past, especially with cheaper consumer drives or early SSDs, where some firmware ignored flush commands to make benchmark results look better.
I’d like to think (and hope) that the reputable HDD manufacturers no longer play such games today…
So, if things go wrong, during a sudden power loss, that “one second” of in-flight data could still be in jeopardy — depending on the drive type.

Consumer drives (“non-NAS”) are usually the most vulnerable here; data still in cache would likely be lost.
NAS-class drives seem to have slightly more aggressive cache-flush strategies, reducing the risk somewhat, but they still don’t have real power-loss protection.

Enterprise drives, on the other hand, typically include true PLP (Power-Loss Protection) using onboard capacitors that keep the controller alive long enough to safely flush the cache — ensuring the data really makes it to the platters.

So in the end, it really comes down to what other Members mentioned before: ZFS can do an excellent job — as long as you don’t sabotage it with unreliable hardware.

1 Like

Yes, that is what I meant by write barriers.

I don’t know the exact specifics that ZFS uses, but it is probably something like this:

  • Write data & metadata for that data, to unused space, (aka COW, Copy On Write)
  • Flush write caches
  • Write critical metadata to activate the above writes
  • Flush write caches

Even though it may seem redundant to have 2 write cache flushes, it is extremely important that the critical metadata is not written until AFTER the other writes. This is because a power loss could prevent writing the actual data or standard file metadata, (directory entries). If the critical metadata was written to activate such, we now have a corrupt pool.

I wish there were such a thing: every UPS I’ve ever seen has far more run time than this. It’d be nice if someone made a very low-cost, small, efficient, USB-connected UPS with only 5 minutes of run-time. It doesn’t take very long to finish pending writes and power down a server, so why do I need a gigantic lead-acid battery system and more than 1 hour runtime? Just keep the server running for a few minutes after power failure, and if the power isn’t restored by then, take 20 more seconds and shut it down.

You can setup timers with Nut for clean shutdowns if you really only want your ups to run for 5 minutes.
I also use the extra wattage room of my ups for things like switches, access points, monitors, etc.

My home server sits in a shoe closet with only a small WiFi router, nothing else. This is my point: I don’t need some huge UPS system, nor do I want to pay for it, or deal with the crappy huge lead-acid batteries that only last a year or two before crapping out. I don’t really even have room for something large. I want something about a bit larger than a power bank with lithium batteries with a short runtime, but the market doesn’t really have this.

1 Like

Sure it does. For example on Amazon here is one. GOLDENMATE 1500VA/1000W Lithium UPS Battery Backup and Surge Protector, Backup Battery Power Supply with LiFePO4 Batteries(296Wh), AVR, Line Interactive Sinewave UPS System, 8 Outlets, LCD Display

Has a coms port to issue shutdown, 10 year battery life, reduction in TCO over lead acid, looks good on a desk and is not that large 16.73"D x 6.3"W x 11.81"H, Not anymore expensive than any other average ups of similar size.

Any vendor that causes a drive to lie in response to SYNCHRONIZE CACHE deserves to have their right to use any T10-controlled logos or marks (eg: SAS/SCSI/etc) pulled.

On a related note, the manufacturer that I still own a drive from that did that in the past is no longer in business, having filed for bankruptcy and their assets sold off. :wink:

The SYNCHRONIZE CACHE (10) command (see table 170) requests that the device server ensure that the specified logical blocks have their most recent data values recorded in non-volatile cache and/or on the medium, based on the SYNC_NV bit. Logical blocks include user data and, if the medium is formatted with protection information enabled, protection information.

HDDs report a “write OK” to a regular asynchronous write, but ZFS is specifically sending that SYNCHRONIZE CACHE and waiting for the OK response to certain bits. So we’ll let it handle async writes with the volatile cache and its own firmware spooling it down to disk, but ZFS will eventually force the issue of “hey if you haven’t flushed that by now we’re gonna sit here and wait until you do” when a transaction group is closed off.

3 Likes

There are report of these failing when power is unstable:

1 Like

We need to re-direct the UPS discussions to a different thread. While UPSes are a good thing, (when they work), it is not as relevant to the subject:

ZFS pools & power loss - you won’t lose data

To minimize any risk, it’s wise to pair ZFS with a reliable UPS. For home or small lab setups, something like the APC Back-UPS Pro 1500 or CyberPower CP1500PFCLCD provides both battery backup and line conditioning, giving you enough time to safely shut down during outages. For enterprise setups, a larger online double-conversion UPS ensures clean power and no interruptions.

1 Like