SSD / NVMe - Bits need to be rewritten to maintain performance and integrity

My understanding is that TrueNAS/ZFS checks data to make sure it is correct and if it finds issues can self heal.

Below in this video (it is linked at the correct time) from Steve Gibson (grc.com) he indicates that over time SSD’s need their data to be rewritten every so often to prevent bitrot and maintain performance. ZFS can correct for bitrot but how does it handle SSD performance loss if the data appears correct?

Does TrueNAS rewrite all of the data every so often or just rewrites data when it finds an issue?

Thanks

No. Neither TrueNAS nor ZFS rewrites data automatically. They only re-write data when a problem is found.

Data does not generally “appear correct”, it is or is not correct. That is the purpose of the external to the data block, checksum.

I have several SSDs used with ZFS. For the purposes of this discussion, I will limit it to known problems and what appeared to solve them. Both examples are from hardware purchases about 2014.


A 1TB SATA SSD in my old laptop was giving occasional corruptions to the point of having to perform full restores. Their was a Samsung firmware upgrade, but that did not make the problem go away completely. So I had less trust in the laptop and made a bootable backup on a USB flash drive.

This was in the first few years of use, before I instituted automatic, twice a month ZFS scrubs. After that, nothing ever happened, and it has been over 11 years since I bought that 1TB SSD. (To be clear, it is rarely turned on today, and mostly just gets OS updates every few weeks, with ZFS scrub, as I have a newer laptop.)


My miniature media server has both a 1TB mSATA SSD and a 2TB laptop HDD. It uses 50GB or so from each for Mirrored OS. The rest from each is striped, (aka NO redundancy), because I have good backups.

In the early days I did random ZFS scrubs, perhaps 4 or 6 times a year, but no real schedule. During those scrubs, ZFS would occasionally find a bad block in a video file, (video files are much larger, so statistically that makes sense). Both SSD & HDD experienced failures. I would restore and all would be good again. This happened at least twice a year, perhaps 20 files needed to be restored.

After I instituted automatic twice a month ZFS scrubs, ZERO bad blocks in the non-redundant media pool. That was 7 years ago, (my scrub script keeps a log… so I know the first automatic scrub).

Again to be fair, this miniature PC is fanless, so I added an external USB powered fan blowing across the top. Maybe that added to reliability, but I don’t remember when I added the fan. Perhaps 6 or 7 years ago.


A guess as to why this is happening, is that ZFS scrubs will read failing, but not failed blocks, causing the SSD, (or HDD), to automatically recover & relocate the data. Or possibly in the case of the SSD, re-write the blocks to refresh the SSD memory cells.

Now could re-writing files, (or simply re-syncing ZFS Mirrors), improve the SSD’s performance? Maybe.

As for the Integrity question, in my very, tiny sample size, it appears ZFS scrubs help even without redundancy.

2 Likes

This is a matter for the drive’s firmware.

1 Like

Now that you mention it, I do believe there is some kind of patrol read in some SSDs. Those check for failing, but not failed sectors and automatically correct them.

SSD cells degrade in 3 ways:

  1. Reads reduce the electrical charge in a cell, not by much. But hundreds, (or thousands?), of reads would likely need the cell to be re-written to restore the electrical charge.
  2. Over time, cells leak power. So shelf stability for single level SSDs are higher than dual, triple or quad level in a cell. More than 1 bit per cell is basically analog voltage detection to determine bit state.
  3. After a certain amount of writes, the flash memory cell’s silicon starts to degrade. Again, single level cells are likely to last longer.

Now I am no expert in SSD and flash memory, this was written from memory of reading various articles over the years.

1 Like

I would encourage people to watch the video as it is discussed by drive experts explaining this issue and that TrurNAS needs to be updated to compensate for this especially since all smart and thing like that are being hidden intentionally. Imembrecwnt changes to the smart page and the podcast saying that TN team want all of this managed behind the scenes.

If it is an issue, I would expect it needs to be handled by the drive manufacturers and the base OS. Windows, Linux, BSD, etc. TrueNAS would just follow Debian and ZFS changes.

1 Like

Um, no, it’s a three-hour video. And Steve Gibson can be a bit eccentric, to say the least; he has a long history of (to put it charitably) exaggeration. Can you show some documentation supporting your apparent claim that:

  • SSDs need to be periodically rewritten,
  • Their own firmware doesn’t do this on its own,
  • A default Debian installation also doesn’t do whatever needs to be done on its own, and therefore
  • TrueNAS needs to affirmatively do something to make this happen?

We’re pretty much all in agreement on the first point. The rest? Not so much, I think.

7 Likes

As mentioned, the video link is at the actual time of the question. It is not 3 hours it is about 5 to 10 Minutes and is a critical issue.

Yes ssds need to be periodically rewritten for performances and bitrot prevention. This needs to be initiated by TrueNAS.

(citation needed–other than Steve Gibson)

4 Likes

If I could contact the TrueNAS staff I would, I think Steve Gibson is emment enought to quote for this to be looked at, and if you actually watch the video the answer is by an industry leader in SSD storage and not Gibson.

:waving_hand:

Assuming you mean this part:

SSDs will autonomously rewrite data for purposes of garbage collection, consolidating NAND pages to larger blocks, and yes - for refreshing blocks that have a weakened or marginal charge state. The thing is that most vendors are reluctant to (read: “don’t”) release details of their firmware process to that level, but thankfully Kioxia was proud enough of doing this on even eMMC/UFS devices now that they wrote a tech brief to brag:

A full device read will do it, because all of the data is being read and checksummed. Your SSD will either:

  1. Return the correct data (nothing’s wrong)
  2. Detect the marginal charge, rewrite it to a new page and update its FTL (Flash Translation Layer) inside of firmware, and return the correct data (you may experience a stall depending on the order in which these operations occur, if it waits to refresh the cell or commit the data elsewhere before returning it)
  3. Detect an unrecoverably low charge, throw a URE upstream, ZFS will compare it to the checksum, reply “that’s not cool” and recalculate the lost data from parity[1] and write it back to the disk outside of firmware - at this point the disk writes it to a new location, tries to refresh the bad/weak cell, and if it can’t, tags it as toast and grabs one from its spare NAND

In scenario #3 we should already capture that your disk threw that URE, and if you lose too many cells we’ll report a low spare block reserve level.


Edit: If I’m understanding their (limited) documentation correctly, SpinRite’s “Level 3” and above “drive refresh” will wholesale accelerate decay of your SSDs because they believe that they’re somehow immune to the effects of the SSD Flash Translation Layer. If you write to LBA 00001234 a hundred times on an HDD, you’ll write to the same physical location - do that on an SSD and you’ll be walking across a whole raft of empty NAND pages, because it’s faster to program an empty page than erase an existing one.

Will it restore lackluster read performance? Yes, by paying the penalty in bulk up-front by reading every LBA; and if your drive lacks a reprogramming feature in firmware, you should probably just replace the drive with one that does it’ll fix that at the cost of burning through an entire DWPD cycle (at level 3 … Level 4 will do two with the bonus feature of putting your data at risk while it does the pointless “inverted data” thing!)


TL;DR run regular scrubs to keep your data “fresh and clean”, let the drive fix any that it picks up as marginal/needing ECC correction first, and let ZFS handle the rest from your redundancy.[2]


  1. you do have redundancy in your vdevs, right? ↩︎

  2. seriously. don’t be out here running stripes. ↩︎

10 Likes

Does a scrub rewrite every bit on a drive?

[quote=“HoneyBadger, post:11, topic:62168”]

SSDs will autonomously rewrite data for purposes of garbage collection, consolidating NAND pages to larger blocks, and yes - for refreshing blocks that have a weakened or marginal charge state.

[/quote]

The scenario in the video states that the file is written once and then only read from then on, eg video collection, over time the cell charge weakens and this causes performance issues. I think he mentions the is is caused by electron tunnelling.

Your scenario above is not the same, yours has active disk writing but Gibson’s example is read only.

Also, not all drives are the same and the features you mention might not be industry standard

No, and it doesn’t need to - it will only rewrite those that fail the ZFS checksum.

That’s the issue, scrub is not enough.

You have error correction, but then you have a performance issue.

So in order to avoid the performance penalty incurred during a scrub, an existing feature which reads all data, can be set on a schedule during low-IO times, thereby minimizing the impact of any excessive read IO and potential correction of ECC or other disk errors, we should … program a feature that reads all data, can be set on a schedule during low-IO times, thereby minimizing the impact of any excessive read IO and potential correction of ECC or other disk errors?

2 Likes

TrueNAS handles error correction fine, including scrub and it reading the drive but it does not correct performance degradation on SSD as it does not see it.

This is the issue I am trying to put forward.

Cells that hold files that are only ever read will suffer charge degradation, they will still function, the drive and TN will not see an error but over time there will be a dropping of the read speed. At some point in the future the data on that cell might fail to be read and then drive or zfs would generate an error, at that point the data would be rewritten .

So I raised this about a performance drop, not data integrity.

What’s the performance drop and over what amount of time. I figure TrueNAS enterprise, and others, would have come across this a lot more, if it was an issue. In enterprise, the hardware just needs to make it through to a hardware refresh. What is that max 5-7 years? Why aren’t we seeing more problems currently then? You only pointed to one video and there isn’t any other sources. We need scientific repeatability, multiple sources, etc.

1 Like

@shoulders - I think you are missing a point.

ZFS scrubs will read all used blocks on a SSD. The SSD does the work to detect marginal cells, and relocates them as needed. Neither ZFS or TrueNAS have to do anything unless a cell goes bad enough to fail the ZFS checksum. Then, ZFS will use what redundancy is available to rebuild block, and then re-write the block. Regular ZFS scrubs likely prevent ZFS from seeing cells fade, (that the SSD can recover and relocate), only drop dead cells.

You are proposing to reduce the overall write life of a SSD for what most of us here consider minimal gain. Remember, writing to a SSD cell is the thing that will eventually kill the cell. This is more so when using multiple bits per cell. Meaning single level SSD cells have the longest life, but quad, (or higher), have the least life.

This is why it is not recommended to de-fragment SSDs. While reducing OS overhead can improve access speed, (because the OS thinks the blocks in a file are all in sequence), the extra writes just reduce the life of the SSD.

5 Likes

Occasional writing of static data is what I am proposing, the frequency of which is above my pay grade but the phenomenon is real.

I am sure there are real world examples.

Read and writing have a different effect here.

Yes, reading all of the data through a scrub will check the health of the drive at both the firmware and ZFS level for every allocated block, and rewrite them at the level where the fault is detected.

Writing will do the same, but blindly against every block, unnecessarily burning NAND P/E cycles.

5 Likes