Multiple Boot SSD Failures within 2-3 months

Hello All,

I have had a TrueNAS server for about 6-8 months now (Time flies). Been great up until these past few months.

Around December (2 months ago), noticed issues with the server with not being able to login (admin or SMB share), and would require a power off/on to reset. Additionally, I would get notifications about boot pool failure or increasing error counts on the boot SSD, so its the Boot SSD (Inland Platinum 128 GB). I turned the server off, connected it to another computer, and scanned it. It came back with 5 bad blocks and I could not wipe it at all (multiple programs, could not delete partitions or wipe the disk)

Replaced the boot SSD with a Patriot P220 256 GB which was brand new. No issues until about a week ago (Feb 1st) where I get alerts about increasing error counts on the SSD. Take it out, scan it, and it pops up with 50+ bad blocks, and crashes half the time (after crashing, all subsequent blocks show up as bad).

Only things that change when the first SSD started having issues was updating TrueNAS to Fangtooth, and installing a wavlink 5Gbps ethernet card

Since it is 2 separate new SSDs (new when installing into system), I think its one of the following:

1: Other Hardware causing issues (Motherboard, SATA controller, 5 Gigabit Card, even the cable?)

2: I am dumb and confgured something wrong (most likely)

3: Some weird bug with Fangtooth (I believe the first SSD was fine until I installed fangtooth)

System Info:

MOBO: BKHD 1264 NAS Motherboard

CPU: Intel N100

RAM: 16 GB DDR5 Sodimm 4800 MHz - Crucial

Boot SSD: (Previous) Inland Platinum 128 GB, (Current) Patrioit P220 256 GB

Apps SSD: Silicon Power 256 GB (P34A60 i think)

Cache SSD: Teamgroup MP 33 512 GB

Storage Drives: 2x Seagate Ironwolf 12 TB

Extra: Wavlink 5Gbps Card

OS Version: 25.04.2.6

OS Type: Generic, Community

Let me know if I need to add anything else. Would rather not shred through SSDs, especially right now.

Thank you for your time.

If Is not a bad coincidence… I would suspect more on those (and PSU more than the gigabit card).
Are you using adapter (like molex-to-sata) or cable multiplier?

Honestly? Quite unlikely.
Having said that, boot pool Is pretty stressed (so much that use USB thumbs like some time ago Is pretty discouraged), but if a SSD can’t neither survive 2 months It means was really poor (the worst chinese SSD i have had at least survive 6k hours :smile: ).

Do you NEED the L2ARC? Take a look at sudo arc_summary at the hit ratio statistics and see if you are even using it much. With only 16GB of RAM, you may be better off removing the L2ARC as it will free up a bit more RAM that the regular ARC can use. That would also free up a different SATA slot that you could try to put your boot pool there. Maybe changing positions on SATA connectors will help.

Thank you for the reply.

  1. Forgot to mention the PSU is a Silverstone NJ450 (SFX 80+ Platinum). I do have the Boot drive plugged into the same power as the 2 hard drives. It is the cable that came with the power supply. No adapters Used for this.
  2. I agree that its very unlikely, just that issues started happening when I updated. Both SSDs are from reputable brands which is why I made the post in the first place, as both dying/having issues within 2-6 months is too crazy to be a coincidence imo.

Thank you for your reply. I mean I dont think I NEED it. Both the Apps SSD and the L2ARC (what i called cache) SSDs are M.2 NVME drives, whereas the 2 boot drives are SATA.

Since the P220 debacle, i have changed the cable and moved to another slot, but i haven’t tried reinstalling on a new drive yet.

Is there a history of TrueNAS being harder on SATA drives than M.2 drives or maybe it requires SSDs with DRAM cache?

SSD or M.2 should be fine for the boot pool. If you are looking at the failed drives using the manufacturer software, you should be able to see how much data usage they had. It may just have been two bad SSD in a row.

You may want to edit your data as to M.2 and SSD for your drive info. I thought SSD was all SATA. You might be able to lose the L2ARC and add it to the Apps as a Mirror. A mirror VDEV for apps is probably a better choice as you would add redundancy for that pool.

I’m not a fan of those chinese Nas motherboard for various reasons, but this would be the first time that i read about something so odd. And as you say, also seems a really unlikely coincidence to happen.
For sure, if there Is really something that is not working properly on Hw, later or soon you will start having problem on other disks too IMHO.
First thing: can you at least rma those broken disks?
Second thing: in normal situation there Is no tecnical need to use m2 instead of SATA disk for boot pool… But if you remove this l2arc disk (really unecessary-probably worst then without with such small amount of RAM) and put there a small optane/pseudo optane of 16gb? They are still crazy cheap on AliExpress (with ~10€ you can grab 2-3 of them, depending on period discount), and in my experience they are totally trustable.
The advice to mirror the app pool Is totally right, but the same you can replicate data on the rust disks pool to have a fast and easy-to-setup blackup, and tradeoff the redundancy for the stability of the boot pool. Just my 2 cents

Thank you for the advice. The Inland 128GB has already been RMA’d, so I have a new one. Trying to not have it have the same fate.

I will look at the Patriot Drive using Patriot Software. I did look at it using crystaldiskinfo but it mainly just showed the bad blocks.

I haven’t gotten to installing any apps yet, as this issue has taken precedence, but ill def reconfigure the L2Arc to instead be mirrors of the apps if it isn’t necessary. I mainly use this as storage of personal files but plan on using apps like jellyfin.

Thank you for the reply.

Same here about this type of issue with the Chinese NAS Mobos. I heard good reviews about the BKHD N100 Mobo and did not/have not heard anything about it causing these types of issues.

  1. 1st Disk (Inland 128 GB) was RMA’d. Have replacement ready but want to try to not have issue pop up again. Plan on RMA 2nd disk (Patriot P220 256 GB)
  2. What I may do is remove the l2Arc drive (Teamgroup 512 GB) and I could instead get one of those optane drives. Might also instead install the boot onto my apps drive, and then make the 2 sata drives (Inland 256 GB and Patriot P220 256 Gb replacement) mirrored apps.

Probably unrelated, but it popped up during a search.

There is a thread about having issues with this board/n100 and crucial ram.

What type of case do you have, or rather any chance those SDD ran too hot for a longer period of time?

Thank you for the reply. I will do some memory tests with the stick in the board. Hopefully its not the ram stick :frowning: .

Case is a Bitfenix Prodigy with a 200 mm noctua intake fan in the front. The SSD is right by the fan and is getting plenty of cool air

Once, many years ago, I had a problem with a faulty molex connector that was causing intermittent power issues that were impossible to troubleshoot. Eventually, I tried using a different power connector and my problems went away.

Not a high probability, but try using a different power connector for the SSD or perhaps swap out the power supply if you happen to have a spare.

Good luck.

System: TrueNAS 25.04.2.6 | Supermicro X9SCM-F | Xeon E3-1240V2
32GB ECC RAM | PNY 120GB SSD for boot
3 WD Red 4TB + 1 HGST Deskstar NAS 4TB in RaidZ2
Toshiba 128GB M.2 SSD

Thank you for your personal anecdote. The previous config had the power for the SSD on the same cable as the 2 Hard drives, but I will change it to be its own separate cable.

Thank you all for your help.

I’ve got a ticket in for TrueNAS so maybe then can give a more direct diagnosis, but here are the steps I will be doing that hopefully will resolve this issue. I did run memtest 86+ on the RAM and it passed with no issues (thank goodness)

  1. RMA Patriot P220 SSD, reinstall truenas on the Inland Platinum 256 GB
  2. New SATA data cable for the SSD, in a different SATA slot
  3. Added SATA power cable to power supply (fully modular), connect Boot SSD to its own cable (previously daisy chained with 2 Hard Drives)
  4. Monitor the new SSD closely

Just want to see if that helps at all. If not, will definetly look into an Optane drive as the boot. I will also adjust the setup to remove the L2ARC drive since its not really needed. Just want to get this working and have it stay working for a little bit before doing that.

I will report back my findings in a month or 2 if the same issue comes up or if its all hunky dory.

1 Like