This afternoon, I had 5 of my 6 Drives in a RAIDZ1 vdev drop into a degraded state pretty much simultaneously, each with a near identical number of Read-Errors. I’m nearly certain its not a drive-side issue, since they were all healthy as of yesterday, and earlier this week, a full scrub reported no issues:
truenas_admin@truenas[~]$ zpool status -v
pool: bulk
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ‘zpool clear’.
see: scan: scrub repaired 0B in 4 days 21:46:50 with 0 errors on Thu Jan 15 21:46:53 2026config:
NAME STATE READ WRITE CKSUM
bulk ONLINE 0 0 0
raidz1-0 ONLINE 829 2 0
fb824d60-21f2-4088-80a4-12a3cf58eaef ONLINE 0 0 0
9da156e5-3907-47c9-895e-3edc70c00479 DEGRADED 374 3 0 too many errors
21ac9411-1bce-4af5-855d-cc3f8984e20d DEGRADED 352 3 0 too many errors
c9a7b6eb-37a7-4667-b3fa-0826b419b25b DEGRADED 351 3 0 too many errors
e504cf77-316f-49c4-b101-dfbef49e0208 DEGRADED 353 2 0 too many errors
8884b9a5-e950-4365-b35e-b8e25776d2a7 DEGRADED 304 3 0 too many errors
errors: List of errors unavailable: pool I/O is currently suspended
Based on them all going down in the span of 5 minutes, and temperature history looking fine… To me, that behavior seems like a Controller Issue, Right? Anything else I should check before replacing the SATA Controller?
Any Tips/Advice on how to best shut everything down, swap the SATA Controller, and bring things back up and NOT accidentally obliterate 70TB of Data?
The Obligatory Details on “well we need to know what hardware you are running” first comment:
truenas_admin@truenas[~]$ lscpu
Architecture: x86_64
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core™ i7-12700K
truenas_admin@truenas[~]$ free -h
total used free shared buff/cache availableMem: 31Gi 29Gi 1.6Gi 164Mi 1.3Gi 2.0Gi
truenas_admin@truenas[~]$ lsblk | grep disk
sda 8:0 0 1.8T 0 disk
sdb 8:16 0 3.6T 0 disk
sdc 8:32 0 3.6T 0 disk
sdd 8:48 0 3.6T 0 disk
sde 8:64 0 14.6T 0 disk
sdf 8:80 0 3.6T 0 disk
sdg 8:96 0 14.6T 0 disk
sdh 8:112 0 14.6T 0 disk
sdi 8:128 0 14.6T 0 disk
sdj 8:144 0 14.6T 0 disk
sdk 8:160 0 14.6T 0 disk
nvme0n1 259:0 0 931.5G 0 disk
truenas_admin@truenas[~]$ lspci | grep SATA
00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11)
01:00.0 SATA controller: ASMedia Technology Inc. ASM1064 Serial ATA Controller (rev 02)
truenas_admin@truenas[~]$ lsscsi -g
[0:0:0:0] disk ATA WDC WDS200T2B0B- 90WD /dev/sda /dev/sg1
[4:0:0:0] disk ATA CT4000BX500SSD1 082 /dev/sdc /dev/sg2
[5:0:0:0] disk ATA CT4000BX500SSD1 082 /dev/sdb /dev/sg3
[6:0:0:0] disk ATA CT4000BX500SSD1 082 /dev/sdd /dev/sg4
[7:0:0:0] disk ATA CT4000BX500SSD1 082 /dev/sdf /dev/sg5
[8:0:0:0] enclosu AHCI SGPIO Enclosure 2.00 - /dev/sg0
[9:0:0:0] disk ATA ST16000DM001-3Y4 DN01 /dev/sdg /dev/sg6
[12:0:0:0] disk ATA ST16000DM001-3Y4 DN01 /dev/sde /dev/sg7
[29:0:0:0] disk ATA ST16000DM001-3Y4 DN01 /dev/sdi /dev/sg8
[30:0:0:0] disk ATA ST16000DM001-3Y4 DN01 /dev/sdh /dev/sg9
[31:0:0:0] disk ATA ST16000DM001-3Y4 DN01 /dev/sdj /dev/sg10
[32:0:0:0] disk ATA ST16000DM001-3Y4 DN01 /dev/sdk /dev/sg11
[N:0:6:1] disk Samsung SSD 980 PRO 1TB__1 /dev/nvme0n1 -