Near on 50 hrs spent now returning to TrueNAS rebuilding a TrueNAS server with the latest TNScale CE- (having left the space around a decade ago), gotta ask the obvious question just how crucial are a couple of read write errors!?
RAIDZ2 8 discs upgrading the pool slowly and laboriously from 2TB (Old Hitachi) to 4 TB (New WD RED plus) discs - put in a new disc into the pool and an hr or so later KABOOM… it’s saying the pool is degraded due to that new added disc! (A brand spanking new disc!)…would ya’ll tolerate just a few read write erros? I mean the scrutiny app is just saying a 3% chance of failure…
I know from years back just how picky FreeNAS was and I get why ya’ll are using it (how accurate and powerful the disc reporting error is with this awesome software) but in the real world rough and tumble of getting on with your lives (like doing something else with your precious time!!!) …what is a general accepted level of failure tolerance running this software using a RAIDZ2 pool? Are there any settings in Scrutiny (or 25.04.1 itself) that tones down the drive failure aka “gotta replace that drive right now (or your pool is toast) !” panic level? thanks!
Brand new? I’d RMA/return to store asap & not even think about it. Otherwise? I’d investigate what could be causing the errors & try to mitigate. Maybe it isn’t the disk itself at fault.
IMO Raidz2 is there so you have extra time to resolve vs the ‘drop everything & restore redundancy’ of raidz1. It ain’t an ‘oh I’ll just wait for a 2nd one to start having issues, since I have 2 redundant disks’.
Waiting for second disk to get errors/die & then resilvering a replacement drive is 100%, without any doubt, when you’ll have a third disk fail during the resilver (due to Murphy’s law).
That being said, not mission critical data? You got backups? It isn’t a production system? Send it.
I’d also argue there isn’t any generally agreed upon risk tolerance for personal NAS use. Too much of a personal thing, one that has to include the individual’s budget (money or time) vs attachment to data in question.
Well unfortunately you have not provided enough data to give you a proper answer. This may have nothing to do with the new drive or maybe it is the new drive.
Here is what you need to do:
What exactly does Kaboom mean?
Post the output of zpool status -v
Post the output of smartctl -x /dev/sdX where X id the Device ID for the drive.
We can tell you what to do next.
OR better yet, take a look at the Drive Troubleshooting Flowcharts I created that is in the Resources of our forums, or the link in my signature. It will take you step by step to diagnose if you have a drive problem or possibly a ZFS/computer problem. Way too many people jump to the wrong conclusion that if an error has occurred, the drive must be at fault. In your case it could be the data cable, or an unstable system. Yes, a perfectly good system last year could be unstable now.
Good Luck and feel free to post those outputs requested. While you are at it, maybe some of your system specs as well. These little things might not actually be little and contribute to problems.
EDIT: I forgot to say, are you sure the drive is not SMR? Posting the SMART output will confirm that.
I have no doubt about that and you listed them as WD Red Plus, so those “should” be CMR, but what is actually in there? It may be all good but I’d rather explore all the possibilities that I can and sooner than later so you can have a proper answer to your concerns and then move forward. But do not replace the drive without knowing if it is bad, right now it is speculation.
Modern drives (everything build within the last 30 years) have internal defect management. Under normal circumstances they appear always as error-free. Any error is a sign of something wrong. This isn’t necessarily the disk itself, it can also be cabling, connectors, hotplug bays, controllers, power supplies, mainboards. I’ve seen strange pool errors that were caused by an insuffcient power supply.
I have good backup.
The pool is also replicated on another stripped 8TB larger disc in one of the SATA only bays
The faulty pool is “degraded but functional”… it’s 8X ZH2 and just the one drive/bay etc is down…
I am actually re adding the last problem drive to the pool now in a SATA only bay
Another drive has the read/write errors now…
I suspect this is a drive bay issue/power supply/wire/ HBA issue???
This is 12-14 yr old gear I am rebuilding here… (See my other thread)
Functional and still powerful, hardly ever used, hasn’t been knocked around much but it’s 12 year old enterprise gear (now with new SSD’s for the TN OS and new 4TB WD drives)
Once I have the entire pool up and expanded I will share some more logs
Thanks everyone for being so helpful !
Storytime: My ex-brother-in-law use to work at Microsoft Tech Support for MS-DOS and Windows 3.11. He took a call from a woman (it could happen with a man as well) and she told him her plight. He told her to double click on the button on the screen. She said it didn’t work. They talked for a few moments and he told her to try it again, double click on the box on the screen. [a faint tap, tap heard in the background], again nothing worked. He asked her what that noise was, she said it was the sound of the mouse hitting the screen (glass those days) as she double clicked the box. He shook he head and told her nicely that she should box the computer up and return it from where she bought it. Okay, he wanted to say that but he did the right thing and told her exactly what to do, but that was the kind of calls he would get.
pool:
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use ‘zpool clear’ to mark the device
repaired.
scan: resilvered 750G in 02:09:59 with 0 errors on Mon Jun 2 00:49:27 2025
config:
really appreciate the help… it’s 7am here, cold, dark and I’ve gotta get to work earn some real dollars hit the road before the sun gets up
The traffic starts to get real bad real soon…
Will be back when I can in around 8-10 hrs
Please help those who try to help you:
COMPLETE hardware description, including motherboard, CPU, RAM, HBA and its firmware version, etc.
And, please, properly format terminal output with the </> button or by pasting the text between two lines of three backquotes
```
output
```