Hello,
I built my TN Scale (V 25.04.0) using 3 brand new WD Red Pros with a RaidZ1. The mainpool has the 3 drives and while the storage shows “no errors” under the VDEV, but the pool shows as “unhealthy” and each of the drives keeps accumulating more checksum errors by time - buy they match so the 3 drives jump from 10 to 15 then from 15 to 30 errors etc.
I ran a scrub and do LONG and SHORT SMART drive tests… I am lost if these errors are hardware or in my data.
I also replaced the SAS/SATA cables and the SAS controller.
Thanks to Protopia for the following.
‘I have a standard set of commands I ask people to run to provide a detailed breakdown of the hardware, so please run these and post the output here (with the output of each command inside a separate (</> or Ctrl+e) preformatted text box) so that we can all see the details:’
lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
sudo zpool status -v
sudo zpool import
lspci
sudo storcli show all
sudo sas2flash -list
sudo sas3flash -list
Checksum errors can be due to disk hardware, but more often they relate to disk controller errors or overheating, power or SATA cable connections, PSU issues or memory issues, and reseating memory sticks, PCIe cards and power/SATA cables can often stop them for continuing to occur.
After reseating the memory run a memory test for a few hours.
Then do a sudo zpool clear poolname for the pool experiencing errors to reset the error counters and see what happens.
Thank you. In copying the codes above, one returned a file that was actually corrupted, and once removed and ran another scrub everything seems healthy again. I appreciate the quick response.
However, why isn’t there (or maybe there is?) a way to see that file or the same details from a GUI? It feels like this is an error (not some sort of advanced functions) and there could be a way to dive more from the GUI.
In a way, the TrueNAS GUI is lacking some functionality. I don’t even know if the GUI has the function you are looking for in the request above.
On the other hand, TrueNAS, (and FreeNAS before it), were always intended to have Unix Shell access for more detailed trouble shooting.
Some NASes are all about the GUI, and yet don’t cover everything. When odd failure modes occur and are not fixable or troubleshootable from the GUI, those users may be out of luck. Time to restore from backups.
Not saying one philosophy is better than another, just different.
One reason I chose TrueNAS is that Unix Shell is both readily available, and down right useful at times. My prior NAS, an Infrant ReadyNAS 1000S, also had Unix Shell access. I simply could not live with either a NAS having a heavy focus on MS-Windows, (which I don't use at home). Or limited trouble shooting from any NAS GUI.