SMART Tests Haven't Been Running?

Put the answer on the line below [ /quote ] :wink:

The point was to show you can actually connect 20 drives to a single HBA. It takes suitable breakout cables (SFF-8643 to 4*SATA), as you probably do (SFF-8087 to 4*SATA?).
The expander is cheap—and does not need a PCIe slot, it only needs power.

That. And checking temperatures, airflow, and how the fans are regulated.
And quite possibly changing for more powerful fans. If the drives are over 50°C in a cold room, and have reached 65°C in summer, just adding one fan is unlikely to do it.

Dunno what that means/involves.

@koberulz
I live, eat, breathe airflow, and am not shy to modify a case to make sure proper cooling is occuring. It is one of the things people overlook when building a server.

For example: The HBA’s you have, here is a quote form the docs I quickly looked up

These things are designed to work in a true server chassis with all those really loud screaming fans. I’m not telling you that you need loud fans, but I am saying you need adequate cooling.

You have other components which need proper airflow as well, the CPU, RAM, Motherboard chipset. Basically everything.

I know your server has been running for awhile and you might feel it is okay as-is. That is entirely up to you. Again, we are only providing you advice which may help you.

As for the HBA, you could reduce it down to a single HBA if you can obtain the expanders. Do some research and see if that is what you would like to do. There are a lot of forum threads talking about expanders here.

Those drives are hot, I suspect you have them stacked on top of each other. Space them out if you have not already done so. This will help dissipate the heat generated.

Dang, another thunderstorm rolling in. Time to go.

1 Like

You and everybody else.

I have the two enclosures at the bottom of the case filled, and two drives attached to the…ceiling, I guess? I’m not sure where the rest are.

I’m not sure what the best way forward is regarding the HBA situation, I’ve had a couple of options thrown out that I don’t really understand, or even know enough about to properly investigate (though it’s also not a good time for me to be investigating things right now either, life is somewhat swamping me at the moment).

Finally figured out how to initiate an RMA from Australia, so I’ll get the ball rolling on that one. But I do kinda figure if I’m opening it up and doing work, might as well get it all done because it’s a pain in the neck to get at the insides of this thing.

I remember asking this before, in a different thread—possibly even a different forum—but how does TrueNAS know which disk is which? That is, if I unplug drives do I have to be careful about plugging them in with the exact same cable? If so, how would swapping the HBA work? If not, what exactly is the identifier?

I probably need to sort out some sort of backup for the OS drive, too, since I’m assuming if that dies I lose everything? I know I should technically have a whole second NAS backing up this NAS, but I’m not made of money.

TrueNAS, well ZFS does not care so long as all the drives are connected when you boot the system back up. That is one of the great advantages for me, I can move my drives to any machine, hook them up, and run the system.

Nope, not true. You should maintain a copy of your TrueNAS config file for quick and easy restoration if the boot drive fails.

Is think you should read some about TrueNAS in the online Docs.
There is a lot of good information and information on how to replace a failed drive, backing up your configuration, SAS Expanders, a lot of stuff. The link is for EE, what you said you were running.

1 Like

So what is it using to know which drive is which then?

WD won’t let me RMA the drive without registering it, which requires providing proof of purchase, which requires knowing which of the half-dozen invoices for 20TB WD Reds goes with this drive. Which I have no way of doing…

The problem is that’s an overwhelming amount of info…at present I really just need some specific input on the HBA and cooling issues so I can get those resolved.

A few things, but one is the PARTUUID shown in the lsblk and zpool status commands you were asked for earlier. Another is the metadata stored in the ZFS labels on each disk.

This is a tricky one. If these drives were new when you bought them, and if you have always put them in your system in the order you bought them never leaving one sitting on the shelf, one way would be to look at the Power_On_Hours of all drives and sort them from oldest to newest. Then cross reference with your invoices, oldest to newest.

You can also ask the seller(s) if they recorded serial numbers of items they sold you. Some do for purposes of validating returns and such.

That is nuts. My three WD drives are well over the warranty period so I do not have to worry about that. :face_with_open_eyes_and_hand_over_mouth:

Magic, or what @neofusion said.

Trust me when I say that I understand. But if you do not at least take the time to read the documents that are presented, you are not going to be helping yourself. When a problem comes up, do you expect to just ask a forum question each time? I hope not. That is not really what we are about. We help but sometimes that help is to point in the direction of documentation where you would benefit much better in the long run.

TrueNAS is not a build it and forget it system. If you do that, you are looking at trouble down the road.

1 Like

Depends on the problem, really.

And when it comes to things like cooling and HBAs, that seems like a more general issue the docs are unlikely to address anyway.

I was able to get the US live chat support to register the disk for me, so I’ve been able to arrange the RMA. Now I just have to figure out where it is and get it out of the case. And, I guess, see if I can plug in that other fan while I’m at it.

Make sure that while the system is down, to mark the other drive(s) with an easy to see serial number so you will not have to guess the next time.

1 Like

The canned schedule is for midnight on a Sunday, which a) would mean I would have to run all drives at once, and b) errors when I try to set it because there is already a short test scheduled for that time.

Can I get something specific on the HBA issue? If I need to be replacing what’s in there, I’d rather do it while I have everything open sorting this drive out, and it seems like something that needs to be done sooner rather than later to prevent all my drives cooking anyway. The cards I have were what was recommended to me when I was first asking around about setting this up. I don’t know enough to know what I’m looking for, or tell good info from bad, and I’m not sure the TrueNAS docs would even cover this as I said—you mentioned the section on SAS Expanders but that’s kind of lacking in specific detail except to note they’re not recommended for SATA drives anyway.

Not necessarilly specific to your exact hba or firmware, but goes over steps on how to find firmware and flash in pretty good detail. Should be a lot of parralels.

Edit: I’m also fairly certain that I previously linked to a post with your exact model of HBA as someone else in the past did upgrade the firmware; likely they also included steps & their experiences.

Well it seems from the discussion in this thread that I don’t need to update the firmware so much as replace the hardware…

Okay, so you can’t pick the same time to start both the short and long tests, is there really nothing you can do differently? *hint hint* *nudge nudge*

Not without using a custom schedule?

What is the problem here?

@joeschmuck is suggesting that the use of a custom schedule may be part of the problem, and that I should avoid it.

Fair enough.
May I propose a shuffle?

Remove the Short test from the schedule.
Schedule the Long test using the ā€œpanned settingā€, Sunday at midnight.

Leave it and see what happens.

If you want to schedule your SMART tests without actually scheduling them, try out the Drive-Selftest script. It is fairly small and runs independently of Multi-Report if desired. I would say the easiest way is to simply use Multi-Report, use the default values and all you need to do is enter your email address in the configuration setup, then setup one CRON JOB to run the script once daily.

The defaults are daily Short and weekly Long (spread out over the week).

Take a look using either my signature link or use the Resources to find it. Over 10,000 people can’t be wrong. Okay, maybe it is more like 200 people. I wish I know but I will never know. And if you have a question about these scripts, use the Multi-Report forum thread as many people will likely answer your question before I see it.

I uh… seem to have missed the part where your HBA was deemed faulty in any way - I’d still argue that getting on the latest working firmware has a decent enough chance to mitigate various issues & could remove the need to replace anything physically.

1 Like