Unrecoverable Error - Please Help!

A knowledgeable friend help me set up a TrueNAS installation several months ago. I put 2x18TB hard drives in and set it up with SMB to share with my PCs. It’s been running like a dream until I happened to notice that I had two error messages.

Screenshot 2024-07-08 202220

I looked in the Pool at the drives and ran two Short S.M.A.R.T. tests on both drives and they both came back as passing. What was weird was that when I ran the tests, I got this following message, which is a clue that something us up.

I didn’t think about the time stamp at all until I looked at a file that I had just worked with and it showed the same 2015-01-02 date stamp. And when I went back to the original ada2 error message, it struck me that the date was a 2014 date… huh? Files stored on the hard drives are also showing 2015 dates on them. I looked at the system date and time and it has today’s date and time on it.

So, I started a LONG S.M.A.R.T. drive test on both drives before I turned in last night and when I logged on this afternoon (16 hours later) the system showed that both tests on the two drives were both still running. Is that normal? That seemed excessive, so I thought that maybe the NAS just needed a good old-fashioned reboot. When it came back up and I tried to run the SHORT SMART tests, they both passed but still had the odd 2015 date that looked like the screen shot I shared above. I looked in other tabs in the NAS and it also looks like the reporting databases are corrupted:

Finally, when I looked at the pools tab, the pool is on line, but shows unhealthy:

So I appear to be in trouble, but do not know exactly what is wrong, or more importantly, what I need to do next. I am running Version TrueNAS-13.0-U6.1. I know some Linux/Unix basics (like enough to run some commands in a shell), but would appreciate someone being able to walk me though the steps I need to determine what is wrong and fix it.

Many thanks in advance!

Red

Do you have the all the data stored somewhere other than your TrueNAS, backup?

Next is asking you to post your detailed hardware and software setup.
See ‘Posting to get a response’ from old forum
https://www.truenas.com/community/threads/forum-rules.45124/

1 Like

To rule out the GUI, in the command-line (via SSH):

zpool status -v "Rosi Storage"
1 Like

Thanks, Small Barky. I did look at the boards to see if there was anything like this that is already here, and while there is one or two with similar issues, those situations looked like there was only one drive having an issue.

Back up? Yes, but I started consolidating things on the new NAS, so it would take some time to re-collect everything from other sources. Please tell me that’s not a real concern here… I get that it could be.

My True NAS box is a Dell Optiplex 9020 MiniTower. Dell MB and a Intel Core i5-4590 CPU. I put 64 GB of DDR-3 in it. I have a 256GB Samsung SSD as the “OS” drive and then 2x18TB Seagate Drives. I do not know the Graphics card, but it is stock, on board - no discrete video cards or controllers.

Winnie Linnie - I got the following response when I ran the zpool command in the NAS Shell (I do not think that is what you meant by SSH, but that’s how I knew to check it):

What other information would be useful?

Red

I mentioned backup because your data was on a Mirror pair, you have one ‘naughty’ disk and reslivering with your disk size is long and a bit risky. You also have a strange problem with the date and time and I worried about corruption of some kind. The more good copies of data you have, the better.

I will let Winnie Linnie follow up, way more experience. I think the NAS shell should be fine. They will mention if you need to be in the Console or SSH, I think

1 Like

The pool looks fine.

This is starting to look like a GUI / middleware bug.


Do you see any errors logged in the drives’ SMART data?

smartctl -l error /dev/adaX

Replace “X” with 1, 2, and 3.

First off, thanks for the follow up. I ran the commands on all thre dries. ada0 and ada1 have no, but ada2 has a log full of 20 errors. Here is the top of the report:



I did some additional digging and I think this might be useful. Something about the data issue was bugging me, so I went into the BIOS and looked as the clock. It was set to 2014 - which makes me think that the computers clock (is that a CMOS clock?) may have reset. The computer that is housing the TrueNAS is a 2015 machine. I am wondering if I have a bad CMOS battery - or one that needs replacing. Could a CMOS clock cause errors like this?

Red

:warning: You need to replace ada2 ASAP.

Is ada2 your boot drive, or is it used in your storage pool vdev?

ada0 is my boot drive. ada2 is one of the pool drives.

Another data point. After I went into the BIOS and reset the CMOS clock, TrueNAS is no longer reporting any warnings about the pool:


Red

A hard drive with 20 uncorrectable errors needs to be replaced as soon as possible. Even if the GUI reports no issues after a reboot.

At minimum, run a “Long” selftest on ada2.

2 Likes

I appreciate the advice. I am not opposed to buying a new drive - I just did not know if the CMOS clock might be an issue. I did start a long SELFTEST on ada2 last night and will post when I have it back.