Two failed boot drives in a month - is my system speeding up disk failure?

Hello everyone,

Recently I have some issues with one of my boot drive and I might need some help, or at least some advice.

I’m not a programmer, I just know that I can search stuff online, how to read documentation (maybe poorly) and when not to copy/paste run any command line I may find online.
Some additional info: the CPU has no integrated graphics, and I have no more spare GPU to put inside the build. So I’m limited to web GUI and SSH for troubleshoot…

Long story short, I have a system (which is my first and only TN build) running for about 2years now. The motherboard, CPU, RAM, PSU and Crucial SSD are from an old desktop setup from around 2022 ; HDDs and PNY SSD were new for this build (2024). The system ran without issue for the past 2 years until last December, when I first encountered an issue with the boot PNY SSD with an alert Boot pool status is DEGRADED: One or more devices has experienced an unrecoverable error. and a boot pool status Degraded. The mirrored Crucial SSD was fine.

  • I tried to shut down the server to check/reseat cables: it was all good, but the server wouldn’t boot afterward.
  • I tried to swap SATA power/data cable: not effective.
  • I tried to switch the 2 SSD: server could boot on the Crucial drive only.
  • I then connected the faulty PNY drive to my desktop: it messed up my Windows OS on another drive, so I gave up on this PNY drive.

I used only the (mirrored) Crucial SSD for 2~3 weeks, and in the meantime reached out PNY customer service that sent me a new drive as part of warranty.
I used the Replace function from System>Boot>Boot Pool Status page and throw the new PNY SSD in the system, Jan. 1st.
As of today, this new PNY SSD (listed as /sde) has 3 partitions:

/sde
Device       Start       End   Sectors   Size Type
/dev/sde1       40      2087      2048     1M BIOS boot
/dev/sde2     2088   1050663   1048576   512M EFI System
/dev/sde3  1050664 488397134 487346471 232.4G Solaris /usr & Apple ZFS

I logged into the web GUI on Jan. 16 and saw again the alert Boot pool status is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected for the new PNY SSD, connected to the same sata cables as the previous failed one.
I grabbed a snippet of the console, with the same prompt repeating every 5s or so (the only thing that changed was the tag#):

log
…
Jan 16 18:34:22 truenas kernel: sd 14:0:0:0: \[sde\] tag#3 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
Jan 16 18:34:22 truenas kernel: zio pool=boot-pool vdev=/dev/sde3 error=5 type=5 offset=0 size=0 flags=2098304
Jan 16 18:34:22 truenas kernel: ata15: EH complete
Jan 16 18:34:22 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:22 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:22 truenas kernel: ata15: EH complete
Jan 16 18:34:22 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:22 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:22 truenas kernel: ata15: EH complete
Jan 16 18:34:22 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:22 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:22 truenas kernel: ata15: EH complete
Jan 16 18:34:22 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:22 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:22 truenas kernel: ata15: EH complete
Jan 16 18:34:22 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:22 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:22 truenas kernel: ata15: EH complete
Jan 16 18:34:22 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:22 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:22 truenas kernel: sd 14:0:0:0: \[sde\] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jan 16 18:34:22 truenas kernel: sd 14:0:0:0: \[sde\] tag#18 Sense Key : Illegal Request \[current\]
Jan 16 18:34:22 truenas kernel: sd 14:0:0:0: \[sde\] tag#18 Add. Sense: Unaligned write command
Jan 16 18:34:22 truenas kernel: sd 14:0:0:0: \[sde\] tag#18 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
Jan 16 18:34:22 truenas kernel: zio pool=boot-pool vdev=/dev/sde3 error=5 type=5 offset=0 size=0 flags=2098304
Jan 16 18:34:22 truenas kernel: ata15: EH complete
Jan 16 18:34:26 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:26 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:26 truenas kernel: ata15: EH complete
Jan 16 18:34:26 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:26 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:26 truenas kernel: ata15: EH complete
Jan 16 18:34:26 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:26 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:26 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#27 Sense Key : Illegal Request \[current\]
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#27 Add. Sense: Unaligned write command
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#27 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
Jan 16 18:34:27 truenas kernel: zio pool=boot-pool vdev=/dev/sde3 error=5 type=5 offset=0 size=0 flags=2098304
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:27 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:27 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#16 Sense Key : Illegal Request \[current\]
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#16 Add. Sense: Unaligned write command
Jan 16 18:34:27 truenas kernel: sd 14:0:0:0: \[sde\] tag#16 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
Jan 16 18:34:27 truenas kernel: zio pool=boot-pool vdev=/dev/sde3 error=5 type=5 offset=0 size=0 flags=2098304
Jan 16 18:34:27 truenas kernel: ata15: EH complete
Jan 16 18:34:31 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:31 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:31 truenas kernel: ata15: EH complete
Jan 16 18:34:31 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:31 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:31 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#11 Sense Key : Illegal Request \[current\]
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#11 Add. Sense: Unaligned write command
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#11 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
Jan 16 18:34:32 truenas kernel: zio pool=boot-pool vdev=/dev/sde3 error=5 type=5 offset=0 size=0 flags=2098304
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: ata15: EH complete
Jan 16 18:34:32 truenas kernel: ata15.00: configured for UDMA/33
Jan 16 18:34:32 truenas kernel: ata15.00: device reported invalid CHS sector 0
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#22 Sense Key : Illegal Request \[current\]
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#22 Add. Sense: Unaligned write command
Jan 16 18:34:32 truenas kernel: sd 14:0:0:0: \[sde\] tag#22 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
Jan 16 18:34:32 truenas kernel: zio pool=boot-pool vdev=/dev/sde3 error=5 type=5 offset=0 size=0 flags=2098304
Jan 16 18:34:32 truenas kernel: ata15: EH complete
…

Boot Pool Status showed dozens thousands of cheksum error, so I ran a long SMART test with this result, that seems fine I guess:

SMART
=== START OF INFORMATION SECTION ===
Device Model:     PNY CS900 250GB SSD
Serial Number:    PNY24462411120100C1F
LU WWN Device Id: 5 f8db4c 244600c1f
Firmware Version: CS900202
User Capacity:    250,059,350,016 bytes \[250 GB\]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/6061
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Fri Jan 16 19:09:53 2026 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:                (  720) seconds.
Offline data collection
capabilities:                    (0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   3) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       505
12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       7
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       0
170 Unknown_Attribute       0x0003   034   034   010    Pre-fail  Always       -       854698491935
173 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       4
194 Temperature_Celsius     0x0023   067   067   000    Pre-fail  Always       -       33 (Min/Max 33/33)
218 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
231 Unknown_SSD_Attribute   0x0013   100   100   000    Pre-fail  Always       -       100
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       179

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%       293         -

# 2  Extended offline    Completed without error       00%       124         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1        0        0  Not_testing
2        0        0  Not_testing
3        0        0  Not_testing
4        0        0  Not_testing
5        0        0  Not_testing
Selective self-test flags (0x0):

I powered off the system, replaced the SATA data cable with a new one, and boot up the system: it started and the recurring messages in the log didn’t show up.
Checksum errors quickly rose to ~12k, and it still grows by 50~100 more each days since.

Everything seems to work as good as previously (yet). I have some apps/docker container and its use is mainly for qBittorrent, Handbrake and Jellyfin.

I’m at loss to what’s happening to my boot drive and how concerned I should be:

  • Is my ‘new’ PNY SSD already dead, less than a month in use? Is there something I can fix (how?) or are these CHKS errors permanent?
  • Have I really a bad luck with 2 failed drives?
  • Or can something in my setup cause/speed-up SSD failure?

I’m opened to any advice (even if it’s ‘get another brand of SSD, ASAP’) and can provide further info/logs if needed.

Thank you very much !

System

Scale 25.04.0

MB:Asus prime b450m-a
CPU: AMD Ryzen 5 3600 (3.6 GHz)
RAM: 2x 16GB DDR4 3200MHz (non-ECC, G.SKill)
PSU: some 650W 80gold

Boot Pool: 2x SSD mirrored - PNY CS900 250GB / Crucial MX100 256GB

Storage: 3x HDD Seagate Ironwolf 4TB (ST4000VN006) in RAIDZ1

Ah yes - my favourite kind of attribute to check… and you get FOUR of them? Lucky. Plus, also:

Jokes aside, you mentioned you changed sata power & data cables, but any chance you tried a different port in the motherboard? Otherwise, I mean, I’ve had ssds and hdds go faulty two at a time before, I’ve had ports go faulty too, so I guess bad things can happen to good hardware. Hopefully someone has something more useful than that.

The three partitions on the boot drive is normal; mine are showing the same.

That’s a good point, I did not. I will try to put the drive on another port and make an update when I have time to try this, thanks for the advice.

That being said, I don’t know if it will boot at all, I don’t remember how is configured the boot order in bios. Hopefully it’ll work.

Check the motherboard manual to see how to set the boot order and access BIOS / UEFI menus

That’s the neat part: I can’t. I don’t have a GPU in the system, and my CPU doesn’t have iGPU. I may enter the BIOS, but I’ll go blind…

If you have mirrored boot, there is always the option to disconnect one of the boot drives & generally bios will default to whatever disk has a bootable image. Considering we’re already having constant issues with boot drives, this carries risk.