SSD / NVMe - Bits need to be rewritten to maintain performance and integrity

shoulders · January 12, 2026, 4:41pm

My understanding is that TrueNAS/ZFS checks data to make sure it is correct and if it finds issues can self heal.

Below in this video (it is linked at the correct time) from Steve Gibson (grc.com) he indicates that over time SSD’s need their data to be rewritten every so often to prevent bitrot and maintain performance. ZFS can correct for bitrot but how does it handle SSD performance loss if the data appears correct?

Does TrueNAS rewrite all of the data every so often or just rewrites data when it finds an issue?

Thanks

Arwen · January 12, 2026, 6:27pm

No. Neither TrueNAS nor ZFS rewrites data automatically. They only re-write data when a problem is found.

Data does not generally “appear correct”, it is or is not correct. That is the purpose of the external to the data block, checksum.

I have several SSDs used with ZFS. For the purposes of this discussion, I will limit it to known problems and what appeared to solve them. Both examples are from hardware purchases about 2014.

A 1TB SATA SSD in my old laptop was giving occasional corruptions to the point of having to perform full restores. Their was a Samsung firmware upgrade, but that did not make the problem go away completely. So I had less trust in the laptop and made a bootable backup on a USB flash drive.

This was in the first few years of use, before I instituted automatic, twice a month ZFS scrubs. After that, nothing ever happened, and it has been over 11 years since I bought that 1TB SSD. (To be clear, it is rarely turned on today, and mostly just gets OS updates every few weeks, with ZFS scrub, as I have a newer laptop.)

My miniature media server has both a 1TB mSATA SSD and a 2TB laptop HDD. It uses 50GB or so from each for Mirrored OS. The rest from each is striped, (aka NO redundancy), because I have good backups.

In the early days I did random ZFS scrubs, perhaps 4 or 6 times a year, but no real schedule. During those scrubs, ZFS would occasionally find a bad block in a video file, (video files are much larger, so statistically that makes sense). Both SSD & HDD experienced failures. I would restore and all would be good again. This happened at least twice a year, perhaps 20 files needed to be restored.

After I instituted automatic twice a month ZFS scrubs, ZERO bad blocks in the non-redundant media pool. That was 7 years ago, (my scrub script keeps a log… so I know the first automatic scrub).

Again to be fair, this miniature PC is fanless, so I added an external USB powered fan blowing across the top. Maybe that added to reliability, but I don’t remember when I added the fan. Perhaps 6 or 7 years ago.

A guess as to why this is happening, is that ZFS scrubs will read failing, but not failed blocks, causing the SSD, (or HDD), to automatically recover & relocate the data. Or possibly in the case of the SSD, re-write the blocks to refresh the SSD memory cells.

Now could re-writing files, (or simply re-syncing ZFS Mirrors), improve the SSD’s performance? Maybe.

As for the Integrity question, in my very, tiny sample size, it appears ZFS scrubs help even without redundancy.

etorix · January 12, 2026, 7:24pm

This is a matter for the drive’s firmware.

Arwen · January 12, 2026, 7:34pm

Now that you mention it, I do believe there is some kind of patrol read in some SSDs. Those check for failing, but not failed sectors and automatically correct them.

SSD cells degrade in 3 ways:

Reads reduce the electrical charge in a cell, not by much. But hundreds, (or thousands?), of reads would likely need the cell to be re-written to restore the electrical charge.
Over time, cells leak power. So shelf stability for single level SSDs are higher than dual, triple or quad level in a cell. More than 1 bit per cell is basically analog voltage detection to determine bit state.
After a certain amount of writes, the flash memory cell’s silicon starts to degrade. Again, single level cells are likely to last longer.

Now I am no expert in SSD and flash memory, this was written from memory of reading various articles over the years.

shoulders · January 12, 2026, 9:15pm

I would encourage people to watch the video as it is discussed by drive experts explaining this issue and that TrurNAS needs to be updated to compensate for this especially since all smart and thing like that are being hidden intentionally. Imembrecwnt changes to the smart page and the podcast saying that TN team want all of this managed behind the scenes.

SmallBarky · January 12, 2026, 9:30pm

If it is an issue, I would expect it needs to be handled by the drive manufacturers and the base OS. Windows, Linux, BSD, etc. TrueNAS would just follow Debian and ZFS changes.

dan · January 12, 2026, 9:40pm

Um, no, it’s a three-hour video. And Steve Gibson can be a bit eccentric, to say the least; he has a long history of (to put it charitably) exaggeration. Can you show some documentation supporting your apparent claim that:

SSDs need to be periodically rewritten,
Their own firmware doesn’t do this on its own,
A default Debian installation also doesn’t do whatever needs to be done on its own, and therefore
TrueNAS needs to affirmatively do something to make this happen?

We’re pretty much all in agreement on the first point. The rest? Not so much, I think.

shoulders · January 12, 2026, 9:44pm

As mentioned, the video link is at the actual time of the question. It is not 3 hours it is about 5 to 10 Minutes and is a critical issue.

Yes ssds need to be periodically rewritten for performances and bitrot prevention. This needs to be initiated by TrueNAS.

dan · January 12, 2026, 9:54pm

(citation needed–other than Steve Gibson)

shoulders · January 12, 2026, 9:56pm

If I could contact the TrueNAS staff I would, I think Steve Gibson is emment enought to quote for this to be looked at, and if you actually watch the video the answer is by an industry leader in SSD storage and not Gibson.

HoneyBadger · January 12, 2026, 10:24pm

Assuming you mean this part:

SSDs will autonomously rewrite data for purposes of garbage collection, consolidating NAND pages to larger blocks, and yes - for refreshing blocks that have a weakened or marginal charge state. The thing is that most vendors are reluctant to (read: “don’t”) release details of their firmware process to that level, but thankfully Kioxia was proud enough of doing this on even eMMC/UFS devices now that they wrote a tech brief to brag:

A full device read will do it, because all of the data is being read and checksummed. Your SSD will either:

Return the correct data (nothing’s wrong)
Detect the marginal charge, rewrite it to a new page and update its FTL (Flash Translation Layer) inside of firmware, and return the correct data (you may experience a stall depending on the order in which these operations occur, if it waits to refresh the cell or commit the data elsewhere before returning it)
Detect an unrecoverably low charge, throw a URE upstream, ZFS will compare it to the checksum, reply “that’s not cool” and recalculate the lost data from parity^[1] and write it back to the disk outside of firmware - at this point the disk writes it to a new location, tries to refresh the bad/weak cell, and if it can’t, tags it as toast and grabs one from its spare NAND

In scenario #3 we should already capture that your disk threw that URE, and if you lose too many cells we’ll report a low spare block reserve level.

Edit: If I’m understanding their (limited) documentation correctly, SpinRite’s “Level 3” and above “drive refresh” will wholesale accelerate decay of your SSDs because they believe that they’re somehow immune to the effects of the SSD Flash Translation Layer. If you write to LBA 00001234 a hundred times on an HDD, you’ll write to the same physical location - do that on an SSD and you’ll be walking across a whole raft of empty NAND pages, because it’s faster to program an empty page than erase an existing one.

Will it restore lackluster read performance? Yes, by paying the penalty in bulk up-front by reading every LBA; and if your drive lacks a reprogramming feature in firmware, you should probably just replace the drive with one that does it’ll fix that at the cost of burning through an entire DWPD cycle (at level 3 … Level 4 will do two with the bonus feature of putting your data at risk while it does the pointless “inverted data” thing!)

TL;DR run regular scrubs to keep your data “fresh and clean”, let the drive fix any that it picks up as marginal/needing ECC correction first, and let ZFS handle the rest from your redundancy.^[2]

you do have redundancy in your vdevs, right? ↩︎
seriously. don’t be out here running stripes. ↩︎

shoulders · January 12, 2026, 10:40pm

Does a scrub rewrite every bit on a drive?

[quote=“HoneyBadger, post:11, topic:62168”]

SSDs will autonomously rewrite data for purposes of garbage collection, consolidating NAND pages to larger blocks, and yes - for refreshing blocks that have a weakened or marginal charge state.

[/quote]

The scenario in the video states that the file is written once and then only read from then on, eg video collection, over time the cell charge weakens and this causes performance issues. I think he mentions the is is caused by electron tunnelling.

Your scenario above is not the same, yours has active disk writing but Gibson’s example is read only.

Also, not all drives are the same and the features you mention might not be industry standard

HoneyBadger · January 12, 2026, 10:43pm

No, and it doesn’t need to - it will only rewrite those that fail the ZFS checksum.

shoulders · January 12, 2026, 10:47pm

That’s the issue, scrub is not enough.

You have error correction, but then you have a performance issue.

HoneyBadger · January 12, 2026, 10:50pm

So in order to avoid the performance penalty incurred during a scrub, an existing feature which reads all data, can be set on a schedule during low-IO times, thereby minimizing the impact of any excessive read IO and potential correction of ECC or other disk errors, we should … program a feature that reads all data, can be set on a schedule during low-IO times, thereby minimizing the impact of any excessive read IO and potential correction of ECC or other disk errors?

shoulders · January 12, 2026, 10:57pm

TrueNAS handles error correction fine, including scrub and it reading the drive but it does not correct performance degradation on SSD as it does not see it.

This is the issue I am trying to put forward.

Cells that hold files that are only ever read will suffer charge degradation, they will still function, the drive and TN will not see an error but over time there will be a dropping of the read speed. At some point in the future the data on that cell might fail to be read and then drive or zfs would generate an error, at that point the data would be rewritten .

So I raised this about a performance drop, not data integrity.

SmallBarky · January 12, 2026, 11:33pm

What’s the performance drop and over what amount of time. I figure TrueNAS enterprise, and others, would have come across this a lot more, if it was an issue. In enterprise, the hardware just needs to make it through to a hardware refresh. What is that max 5-7 years? Why aren’t we seeing more problems currently then? You only pointed to one video and there isn’t any other sources. We need scientific repeatability, multiple sources, etc.

Arwen · January 13, 2026, 8:52am

@shoulders - I think you are missing a point.

ZFS scrubs will read all used blocks on a SSD. The SSD does the work to detect marginal cells, and relocates them as needed. Neither ZFS or TrueNAS have to do anything unless a cell goes bad enough to fail the ZFS checksum. Then, ZFS will use what redundancy is available to rebuild block, and then re-write the block. Regular ZFS scrubs likely prevent ZFS from seeing cells fade, (that the SSD can recover and relocate), only drop dead cells.

You are proposing to reduce the overall write life of a SSD for what most of us here consider minimal gain. Remember, writing to a SSD cell is the thing that will eventually kill the cell. This is more so when using multiple bits per cell. Meaning single level SSD cells have the longest life, but quad, (or higher), have the least life.

This is why it is not recommended to de-fragment SSDs. While reducing OS overhead can improve access speed, (because the OS thinks the blocks in a file are all in sequence), the extra writes just reduce the life of the SSD.

shoulders · January 13, 2026, 9:00am

Occasional writing of static data is what I am proposing, the frequency of which is above my pay grade but the phenomenon is real.

I am sure there are real world examples.

Read and writing have a different effect here.

HoneyBadger · January 13, 2026, 7:33pm

Yes, reading all of the data through a scrub will check the health of the drive at both the firmware and ZFS level for every allocated block, and rewrite them at the level where the fault is detected.

Writing will do the same, but blindly against every block, unnecessarily burning NAND P/E cycles.

Topic		Replies	Views
The Care and Feeding of SSDs - TRIM and Charge Refresh \| TrueNAS Tech Talk (T3) E049 Announcements Podcast , T3 , SSD , TRIM	61	1627	March 12, 2026
Some hdd always become unavailable after a reboot, help needed TrueNAS General CORE , Hardware	20	1038	May 19, 2024
We’re bringing some SMART options back Announcements	74	5453	June 15, 2026
Unexpectedly high SSD-wear TrueNAS General SCALE	45	3459	June 3, 2024
ZFSisms that are not true, or no longer true Resources ZFS	70	1838	June 17, 2025

SSD / NVMe - Bits need to be rewritten to maintain performance and integrity

Related topics