One or more devices has experienced an unrecoverable error. Not sure of cause

Hey All,
I was greeted with the following error when I checked on my NAS today. Seems that yesterday it encountered an error:

Pool WDBlue state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

Zpool Status yielded the following however I’m not really sure what to make of it.

root@freenas[~]# zpool status -v
pool: WDBlue
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ‘zpool clear’ or replace the device with ‘zpool replace’.
see: #
scan: resilvered 412K in 00:00:01 with 0 errors on Tue Apr 8 06:12:14 2025
config:

    NAME                                            STATE     READ WRITE CKSUM
    WDBlue                                          ONLINE       0     0 0
      raidz2-0                                      ONLINE       0     0 0
        gptid/937f67a7-85c9-11ea-b2b1-bcee7b88cb02  ONLINE       0     0   246
        gptid/940de06e-85c9-11ea-b2b1-bcee7b88cb02  ONLINE       0     0 0
        gptid/610c5be1-e61b-11ed-8549-bcee7b88cb02  ONLINE       0     0   222
        gptid/c1eb9620-3f14-11ef-a7fd-b49691196b6c  ONLINE       0     0   204
        gptid/f20bfd5e-0b6a-11f0-8cea-b49691196b6c  ONLINE       0     0   210

What should my next course of action be? I did some searching through the forums and discovered that this can be caused by cables, I was in the system a week or two ago to replace a failed disk. Seems odd that it would be fine for weeks though.

Results of the resilver:
RESILVER

Status: FINISHED

Errors: 0

Date: 2025-04-08 06:12:13

Name Read Write Checksum Status
/mnt/WDBlue 0 0 0 ONLINE
RAIDZ2 0 0 0 ONLINE
ada3 0 0 246 ONLINE more_vert
ada0 0 0 0 ONLINE more_vert
ada2 0 0 222 ONLINE more_vert
ada4 0 0 204 ONLINE more_vert
ada1 0 0 210 ONLINE mor

I’m still pretty new to ZFS, but does this imply that there are/were errors on 4\5 of my disks? That also seems unlikly to me.

Welcome to TrueNAS and it’s forums!

There is insufficient data to make any conclusions, full hardware please as well as version of TrueNAS Core.

Guessing on disk model based on pool name of “WDBlue”, suggests that you are using disks that are un-suitable for use with ZFS. Western Digital Blue disks may be SMR, Shingled Magnetic Recording, which are known to be problematic with ZFS due to timeout issues with re-shingling.

Further, desktop drives, like WD Blue, may have other problems that Enterprise or NAS specific drives avoid:

  • Very fast head park timeout. Can wear out the head movement device and or parking device quickly.
  • Long time out for error recovery. WD calls it TLER, Time Limited Error Recovery. Seagate and others call it something else. In essence, desktop drives are assumed to be not redundant, thus take extreme measures on bad blocks to attempt recovery. ZFS may consider the block / drive failed because of it.

Both the above problems can sometimes be solved with drive tunables. Generally setting the head parking to minutes instead of seconds takes care of that problem. And the converse setting TLER to 5 or 7 seconds instead of the default of over a minute solves that one.

Now checksum errors generally are not caused by the problems above. RAM, disk controller, disk data cables, power supply and such are more likely.

3 Likes

Welcome to TrueNAS and it’s forums!

Thank you, I’ve been a lurker for a number of years, however, until now I’ve been able to solve most of my issues through old posts or the manual.

full hardware please as well as version of TrueNAS Core.

TrueNAS Core Version: TrueNAS-13.0-U6.7

Hardware:
CPU: i5-4570
Motherboard: Asus HB7M-E
NIC: INTEL I350-T4
Power Supply: 650W
Storage:
Main Pool: 2x WDBlue 2TB 3xSeagate IronWolf 8TB
Boot Drive: Samsung SM863 2.5" 480GB SATA III

Let me know if you need more than that.

Guessing on disk model based on pool name of “WDBlue”, suggests that you are using disks that are un-suitable for use with ZFS

I knew I would catch some flak for that pool name. It is a left over from when I first dipped my toe into TrueNAS. Being tight on cash and new to ZFS I built it with 5x 2TB WDBlue drives. Later, after I found out about the CMR/SMR thing and that NAS drives were not the same as your run a of the mill desktop variety, I began to replace them with 8TB Seagate Iron wolf drives as the original WDBlue ones failed over the last 7 years. The pool has two of the original WDBlue drives left, and 3 of the new Ironwolfs. Hopefully the last two die before I need the extra space.

Now checksum errors generally are not caused by the problems above. RAM, disk controller, disk data cables, power supply and such are more likely.

Given that I was in the system for a drive replacement a couple weeks ago I decided that was a good place to start. I opened it up and reseated all of the SATA cables as well as the RAM. On boot no errors showed up, but I would assume I need to run a SCRUB before I’ll see checksum errors getting flagged?

Also, looking at the results of my previous pool status, can I make the conclusion that none of my files are currently corrupt? It would list the corrupt files if there were any correct? Just trying to ascertain the validity of my latest back up, which occurred after this error surfaced.

Thanks for the help so far :slight_smile:

Glad to hear you’ve got an understanding of the SMR issue and have take action as needed.

Make a note of the existing errors, (well, you basically did here in the forum), and then clear them with zpool clear WDBlue.

Now, either let it run normally and see of the errors popup again. Or manually initiate a ZFS scrub. Then check for errors afterwards.

Correct, there is no indication that there is data loss.


However, you did not list your memory size. Odd behavior has been noticed before using less than the recommended amount. Not likely causing your ZFS pool checksum errors. Just something we in the forums check when given a new server listing.

You may want to run a days long memory check. I don’t know the easiest method with TrueNAS, but you can get a USB flash drive and load the Memtestx86 software on it. Then gracefully shutdown TrueNAS and boot to that USB flash with Memtestx86 software

1 Like

As it stands, your pool provides 6 TB of storage (ca. 5 TB usable), which is less than just one of these three Ironwolves. Irrespective of failures, you have an obvious benefit to replacing these two WD Blue by larger drives at your earliest convenience…

That’s a relief.

Whoops, my bad. The system has 20GB of RAM in in it.

I’l dig up a USB stick and see about doing that.

I ran a scrub of the pool over night and it returns zero checksum errors. I wish I could find something definitive that points to the culprit, but for now at least it seems to have resolved itself. Thanks for the help, it’s much appreciated.

Fair, however right now I’m only using ~50% of my pools capacity so for me, there is no immediate rush to replace the last two.

1 Like