Hello,
I built my TN Scale (V 25.04.0) using 3 brand new WD Red Pros with a RaidZ1. The mainpool has the 3 drives and while the storage shows “no errors” under the VDEV, but the pool shows as “unhealthy” and each of the drives keeps accumulating more checksum errors by time - buy they match so the 3 drives jump from 10 to 15 then from 15 to 30 errors etc.
I ran a scrub and do LONG and SHORT SMART drive tests… I am lost if these errors are hardware or in my data.
I also replaced the SAS/SATA cables and the SAS controller.
Any help would be appreciated
A detailed hardware list would help
Thanks to Protopia for the following.
‘I have a standard set of commands I ask people to run to provide a detailed breakdown of the hardware, so please run these and post the output here (with the output of each command inside a separate (</> or Ctrl+e) preformatted text box) so that we can all see the details:’
lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
sudo zpool status -v
sudo zpool import
lspci
sudo storcli show all
sudo sas2flash -list
sudo sas3flash -list
Checksum errors can be due to disk hardware, but more often they relate to disk controller errors or overheating, power or SATA cable connections, PSU issues or memory issues, and reseating memory sticks, PCIe cards and power/SATA cables can often stop them for continuing to occur.
After reseating the memory run a memory test for a few hours.
Then do a sudo zpool clear poolname
for the pool experiencing errors to reset the error counters and see what happens.
Actually my standard list has evolved and is now:
lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME
sudo ZPOOL_SCRIPTS_AS_ROOT=1 zpool status -vLtsc lsblk,serial,smartx,smart
sudo zpool import
lspci
sudo sas2flash -list
sudo sas3flash -list
sudo storcli show all
for disk in /dev/sd*; do; sudo zdb -l $disk; done
for disk in /dev/sd?; do; sudo hdparm -W $disk; done
for disk in /dev/sd?; do; sudo smartctl -x $disk; done
though I normally remove any of these I don’t think will be helpful.
4 Likes
Given its happening to all three disks - you need to look at what common for the 3 disks.
PSU, Cabling and or HBA/SATA expansion board are likley causes
3 Likes