What should I be doing with this sort of message?

I’ve been having problems with this happening regularly since I moved up to 24.04.0

I find the box unresponsive and after reboot I have a bunch of these. They are never the same disk, varying across all the disks in the system. The USB drives are a necessary evil but that warning cannot be got rid of. They are two usb connected SSDs in Raid 1 format so as good as I could do, however I don’t think they are the root cause here.

I am not new to servers etc. but am a relative newbie to Truenas, so not sure where to find the logs and things to find out what these are doing and why when I do no corrections will the next lot be totally different disks. This had not happened prior to 24, so I am wondering if there is something that is more sensitive to drive issues in that realse I should be considering. Any thoughts anyone has are welcome.

How are the disks connected? Also you mention the two SSDs connected via USB are in RAID1? Is this some form of UEFI RAID or is it a ZFS mirror?
Also, run a long SMART test on all your disks (manage disks, select all, manual test, long).

Hi essinghigh.

My current server is a Qnap box. It has 4 sata connected drives that are in a raid 5 array format. It also has a pcie card with two SSDs in Raid 1 format. The mirroring is done directly in TrueNAS i.e. not using any form of hardware RAID. There are two USB connected SSD drives as the thing would not boot from the SSDs inside and I would loose too much storage using one of the Sata slots.

I have run Smart tests which came back fine, although I did only run the short test so I will take your suggestion and run a full one see what that brings back.

Could you post the output of smartctl -a /dev/sdb and smartctl -a /dev/sdf when the tests complete? Usually uncorrectable/unreadable sectors are an early indicator of a drive dying, probably time to replace the drives (RMA them if you can).

Are you sure these are always random disks? The /dev/sdx label can change if disks are moved around etc. I’d always note the serials of the drives to make sure you can go back and compare.

1 Like

I would, but I cannot access the commandline, If I do it as me it says no permission. If I access as root it just hangs at a black screen.

Not sure if this is another part of the issue with it.

Worrying and weird that login as root is hanging, are /dev/sdb or /dev/sdf your boot drive(s)?
Are you accessing this via the Web-Shell or SSH, if one isn’t working, have you tried the other?

Also - your initial mention of the box being unresponsive and this new mention of the shell hanging as root, you might be running into this issue that has been picked up with the release of Dragonfish. Worth having a look to see if this fixes things.

I had not tried SSH just did that and got this…

image

It is interesting that I’ve not seen this before I put Dragonfish on the box so it may be that it could be that issue. The box has way less memory than I would like in a server and I am getting tempted to build a new server just for this.

I’m going to look at the fix above and checkout any options I might have set to stop root logging in through SSH.

Can’t remember if the SSH service is enabled by default. System Settings → Services.

SSH is enabled, but there was no group selected in the password login group. I’ve changed that and though SSH rejected me with the same dialog, logging into the console from settings just worked okay so will run the commands above and see what happens.

It’s not.

1 Like

I tried running the full smart scan of the disk yesterday. I am not sure if it completed, at somepoint the box dropped offline and I had to reboot to get it back again. So I tried today to do the smartctl command line. While I was figuring out how to get the data out of the shell (still can’t get in via ssh). I had the shell open and noticed this…

Looks like something python is causing these, though not sure what. They seem to keep appearing in the shell every couple of minutes so it is logical after a while this might cause something to break. I’m going to disable the three apps I have running and see if this continues to happen in the hope one of them is leaking something…