SSD going offline/unavailable with no ZFS or SMART errors

So I have 8 brand new Samsung SSD’s, in 2 vdev’s. One SSD seems to keep going offline or during booting.

I was using this same Asrock MB and Power and Cooling 500W PSU for 4 years with no issues. I did go from from 6 SSD’s to 8 now and added a 10G NIC, but thats about it. Now it seems like one SSD keeps going offline. At one point it lasted a few days then dropped. I am using 4 2-way sata power cable splitters, two on each of the main PSU cables. I am using these since they fit easier in the drive as they are installed very close to each other. Could these splitters be an issue, or maybe the PSU itself could be dying?

I would not think 8 SSD’s would take much power but maybe the PSU is not stable for these types of devices (3/5v rails)?

While I am scrubbing the pool now, my UPS only shows around 100W watts usage so not too much.

Any suggestions?

Switch the disconnecting drive with another and see if the issue follows it. If it does, possible bad drive. If it does not, possible bad cable.

1 Like

Thanks, I think the issue is its a bit difficult to see which SN is which without completely taking out some or all of the disks. I will see what I can do.

Sometimes you can get an idea based on enumeration order and depending on how your ports are laid out you can deduce it from minimal foolery. Good luck!

So I swapped what I think sata 0 and 1 were as the failed drive appeared to be the 2nd disk (Sata1). After I did this the pool was offline and showed a message: “Disks with exported pools” for 6 of my 8 disks, LOL. I rebooted and then the pool was online but had a unavailable disk in each vdev. Similar to before but 2 disks now.

I kinda think this may be power related vs cables. I will try to connect without the splitters and see if that helps.

Yeah… if you’re going to be doing things like that you take them one drive at a time…

:grimacing:

Yeah that’s kinda wack.

1 Like

Ok, so I put data con’s back to original locations, removed sata power splitters and used PS’s 4-way sata power. Booted up and pool is back online but with original disk showing unavailable.

So now I will start replacing data cables one at a time, as I dont think its power related at this point. The problem seems to be the same disk each time.

My advice, since you are dealing with RAIDZ1 vdevs, While the TrueNAS system is running, obtain all the drive serial numbers that you can read in the GUI. This will leave one drive that is not readable due to it being unavailable.

Next, power down, remove each drive one at a time. Examine the serial numbers of each drive then mark each drive in a way that you can read the last 4 digits of the serial number so you don’t go through this again.

Once you have identified the drive serial number you have not written down, then that is the drive with a problem. You can replace the data connector if desired however let me ask you a few questions:

  1. How are the drives physically connected to the system? a separate HBA or directly to the MB?
  2. If connected to the MB, what SATA connector on the MB are you using? One that you previously have not used?

Why do I ask? Sometimes people forget or didn’t realize that on some MB’s the SATA ports may be shared with an NVMe drive connection, or similar, and it only supports one ot the other connection, not both at the same time. Make sure this isn’t your situation.

Good Luck

2 Likes

Thanks! So I was just completing removing one drive at a time and waiting for it to boot with all drives online except for the one that had issues. I think I found it. I replaced the sata cable for this one and now ALL 8 are online, however in a degraded state prob due to booting up with multiple failed drives during this process. I have full USB backups so not too worried if it failed.

I think next up is a scrub to try to clean up errors?

I like the idea about writting down SN’s. I will do that if/when I do the next drives.

To answer your questions, all sata drives are plugged directly into the MB sata ports. This MB has 8, so I think it should work, but here are the details:

ASRock MB:

X570D4U-2L2T

https://www.asrockrack.com/general/productdetail.asp?Model=X570D4U-2L2T#Specifications

Here is how the drives are listed for now. Is the scrub a good idea now?

Ok, so Scrub is running now. Will report results soon, hopefully this fixes everything.

Now this is embarrassing since nearly everything in my build is new and/or mint condition. However since I did not have enough Sata cables for the 2 additional SSD’s I added I thought I would use a spare I had laying around from a previous build. I think this may have been the culprit as the one I replaced looked pretty beat up and the replacement seemed to fix the issue!! LOL

Anyways I think I may order all 8 new Sata cables at some point just to have them all new and the same type. Will also document the SN’s once I do this clean-up.

Quick update. So I ran the scrub then ran ‘zpool clear’, then rebooted. All drives are now available and no errors! I think we are good now. However I just ordered a 12 pack of sata cables from Amazon for $11. I will install these this week for good measure!

Thanks everyone for the help. I think we are good.

4 Likes

Great news, glad it was that simple. That is one nasty looking cable. What brand of cable did you purchase? Do the also lock? The price is pretty low so I’m curious. And maybe once you install them, report back how well they are working.

1 Like

So I was not sure how much the brand name mattered in this case but the reviews looked great, including one user who used on his NAS with excellent results. They do appear to lock:

Link:

Yes, I can report back once all setup with new cables. If you feel these are not great feel free to suggest another model as I can easily return these before opening them.

1 Like

Thanks for the link and yes, the reviews look good. I’d like to go through all my cables and throw them away, then buy a variety of cables thar are reliable. But that is a lot of money so i will just need to wait until i need to replace a few.

So for the last couple of days while waiting for my new Sata cables, all drives have remained online and working perfectly!

I just installed these new cables and they look and work perfectly. Very happy with the quality and price. So far would definitely recommend.

1 Like

FWIW, I bought some of these a while ago and all work(ed) fine.

I like to add some deoxit to my connections where possible. Don’t have an ownership stake / connection to the brand other than being a happy customer. Oxidation / high impedance contacts can be pretty unhappy.

Which flavor? Detoxit5, spray, liquid, Gold? I have never heard of this before, I wish I had years ago.

Glad to hear your system is running and hopefully error free for a long time. I may look into buying those cables, although I would need some 90 degree ends, bending the proper direction.

1 Like

For most applications, I prefer the needle version of DL100 because it usually allows for better control of where the drops go. The brush version works well for larger surface areas, ditto the spray. But for electronics, the needle version is ideal to get into even narrow pin headers. The needle ensures the liquid gets to the right spot, the capillary forces then suck the stuff in.

They also market a version for gold contacts and faders / potentiometers. Presumably, the gold version has a stronger flux in it, while the fader lube is no question some kind of special. But for general applications, I doubt you can go wrong with just the original DL100. Cheers!

3 Likes

Me too.. I’d go for some hefty locking cables. Garbage cables - and the hell that follows - just isn’t worth the time or effort.