TrueNAS Scale is unhelpful when replacing failed disks

I did this a couple of weeks back. Sort of sucked, used lsblk then physically crossed off each serial that was present in the array.

I had 4 simultaneous failures, (Physical issue) And the disk was dead, but couldn’t afford to accidentally replace the wrong drive.

And this was the problem. I pulled the wrong drive. And made my problems worse.

This would’ve saved me.

Instead it was something like “was /dev/sdd”, and helpfully provided a link to the disk info. With serial. Removed disk.

Guess what. It wasn’t sdd anymore. The actual disk that failed was removed and had a different unknown serial.

So then we had to eliminate all the others.

Which disk has errors?

2024-05-14_201355

Gee, I don’t know. Which one does the pool status page say has an error? It’s right there in front of you. Hint: Is there something about sdg that’s different than the others?

All I see is a drive that has 1 checksum. I don’t know what that is or how to act on this. Nothing even says those are errors. Sorry I forgot my crystal ball at home. I don’t know if there is anything wrong with the drive, because I have no idea what drive it is. I have 9 drives in there. No idea which is which.

Point taken, you might want to create a JIRA issue about the column labels?

These are

  • read errors
  • write errors
  • checksum errors

respectively.

Might be nice to hilite rows with errors somehow.

Maybe a :warning:

1 Like

You can click Manage Devices on the Storage > Topology widget for your pool to get a more readable output of read, write, and checksum errors.

I believe that particular Pool Status screen in your screenshot has been entirely replaced/removed in favor of the Storage dashboard in Dragonfish.