Last month I made the jump from Electric Eel to Goldeye following the upgrade guide in the documentation. It went buttery smooth. About a week later, I received a ‘failed a SMART selftest’ alert for one of my mirrored boot drives. Before I had even noticed the alert, it had cleared itself. As soon as I got home, I pulled the smartctl log and it had no record of any failures… odd, I thought. And went about my day. ~36 hours later, the same alert to the other mirrored boot drive… Both times, the alert cleared itself exactly 90 minutes later. This behavior has continued per the table below. What’s going on here?
Run this command smartctl -x /dev/nvme0 > nvme0_smartctl.txt
Run this command nvme self-test-log --output-format=json /dev/nvme0 > nvme0_stl.txt
Run this command echo "==================================" >> nvme0_stl.txt
Run this command nvme error-log /dev/nvme0 >> nvme_stl.txt
This will generate two files: nvme0_smartctl.txt which is what smartctl reports. And nvme0_stl.txt which is what the nvme command reports, both the self-test log and the error log.
Post both files here or feel free to send them to my email joeschmuck2023@hotmail.com.
These should shed some light on the issue, or possibly non-issue.
@joeschmuck I went through your Drive Troubleshooting guide again and still did not find any cause for concern. If you agree, I’ll raise this as a bug report for TrueNAS SCALE 25.10.
great time of year to be doubly retired in the state I grew up in
They are mounted directly to the motherboard; CWWK N5105 NAS.
I don’t think I’ve run a manual SMART test on that machine since I first built it years ago. I’ll report it as a bug. Hopefully the devs can glean something from it.
If the is the case, then the two SMART self-tests that were run on that drive were from Goldeye. I don’t see anything wrong with the drive and it could be that a new untested drive is giving the software an alarm. Maybe Goldeye was looking fo rthe last smart test and said “Holy Cow! I need to run a smart test, signal the alarm and run the test.” Once for the Extended test first, then the Short test second. I’m just guessing, I have not looked at the code to see exactly what it is doing.
If they tell you it is a problem, please post the problem report number here.