TrueNAS Scale suddenly can't read more than 9 SAS Disks

Hello everyone,
I’m new here, I’ve been using TrueNAS for some time now as a Plex Media Center and general storage, but on monday happened something strange.
Here’s my config
Case:
CPU: intel i7 6700k
Ram: 4x4gb DDR4
Mobo: Asus ROG Z170
HDD: 12x HGST HUS72403CLAR3000 SAS
PSU: Corsair RM1000x SHIFT
HBA Cables: 3x Mini SAS SFF-8087 36 Pin TO 4xSFF-8482 29+15 Pin
HBA: 2x DELL PERC H200 LSI SAS2008 - flashed in IT

truenas_admin@truenas[~]$ sudo sas2flash -list -c 1
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2008(B2)

        Controller Number              : 1
        Controller                     : SAS2008(B2)
        PCI Address                    : 00:04:00:00
        SAS Address                    : 54dae52-0-ac07-d555
        NVDATA Version (Default)       : 14.01.00.08
        NVDATA Version (Persistent)    : 14.01.00.08
        Firmware Product ID            : 0x2213 (IT)
        Firmware Version               : 20.00.07.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9211-8i
        BIOS Version                   : N/A
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : 6Gbps SAS HBA
        Board Assembly                 : N/A
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.
truenas_admin@truenas[~]$ sudo sas2flash -list -c 0
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2008(B2)

        Controller Number              : 0
        Controller                     : SAS2008(B2)
        PCI Address                    : 00:01:00:00
        SAS Address                    : 5d4ae52-0-7642-2200
        NVDATA Version (Default)       : 14.01.00.08
        NVDATA Version (Persistent)    : 14.01.00.08
        Firmware Product ID            : 0x2213 (IT)
        Firmware Version               : 20.00.07.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9211-8i
        BIOS Version                   : N/A
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : 6Gbps SAS HBA
        Board Assembly                 : N/A
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.

I’ve been having some problem with my main pool degrading, 1 disk always (recently changed, so I thought it was the PSU), that needed to be solved with a zpool clear HDD, but on monday it suddenly lost 3 out of the 12 disks I have. I’ve been trying to reconnect everything by checking all the SAS cables, and I thought it was a problem with my PSU who broke a rail, which was old and I had to use adapter to give the SATA connection power from a MOLEX cable.
I decided to upgrade the PSU and bought a Corsair with 16 sata connection.
Today I redid all the connection with the new PSU, and still, only 9 out of 12 disks shows.
I thought it was the 3 disks that I cannot list the problem, but when I swap them to a connection that previously work, they show up.
So I thought it was the cable, but I changed it to another one that I know it works, and they still not show.
I also swapped the cable to a different HBA port, nothing changed.
It seems that TrueNAS decided that 3 disks cannot be read and seen by any means, and I dunno why.

Thank you for the detailed information.

Just to clarify and consolidate:

  1. You have 3 drives that refuse to be connected to TrueNAS.
  2. These 3 drives do not appear to be failed/bad, they work when connected to a different data connector where a drive does get recognized.
  3. You replaced the data cable that would be suspect, no change.
  4. You swapped HBA data ports, no change.
  5. You replaced the PSU, no change.

Here is where I need clarification:

  1. When you swapped the HBA data ports, did you swap it with a currently working HBA port? And if yes, did the drives connected to that port originally now fail when connected to the HBA port that is related to the failure? I’m not saying the HBA port is faulty, I’m just gathering details.
  2. Post zpool status -v
  3. Post what version of TrueNAS you are running.
  4. You seem to be very good at troubleshooting, taking matters into your own hands to figure out what might be going on before asking for help. I love it. I wish more people were like that.
  5. Now think back, did you move the computer? Update any software? Sudden power loss to the entire system? Reboot or Shutdown?

I need you to think back several days before the incident, not just the day it occurred.

Here is a good link to read, might be similar.

You might also post some of the requested data that this thread lists to provide.
@Protopia is apt to jump in and save the day.

I would have already jumped in (and possibly saved the day) if I had any good ideas to add - but unfortunately I don’t.

As Joe has already said, you have already done all the swap-thing-around options that there seem to be and they haven’t helped, and I haven’t got a single further action for you to try.

The only thing I would ask is exactly how do you know that the drives are not showing up (and do show up when connected to another port)? Are they missing from lsblk?

Also, I am not sure whether you can see any of the drives on your HBAs from BIOS, but if you can do these missing drives appear there?

Hello!
Thank you for your reply, I’ll try to answer everything:

  1. I have 2 HBA with 2 ports each, I tried pretty much every combination with cable-drive-hba port, I can’t seem to find a pattern. Sometimes after a reboot some drive show up and sometime they are missing based on the combo. Gonna need more try and error on that, thinking of doing a spredsheet to track it.
  2. The “HDD” pool is now exported cause it failed and dumped all the disk out once the 3 disk failed, I didnt want to risk losing data.
  3. I was using 24.10.0 (Electric Eel), tried to update today to 25.04.1 (Fangtooth), nothing changed.
  4. Thank you! I’m a sys admin as profession, and a troubleshooter for every friend and family that I know.
  5. Not moved, updated just Plex when he asked to be updated. No power loss, it’s under an APC UPS. Shutdown everytime I’m not using it, since wife doesnt want an additional heater in the house.

I can’t recall anything unusual as the day prior the incident… I’ll try to ask the wife if she did anything.

I’ll read that topic asap!

As per the lsblk question, yes, they are missing completely. 12 drive connected, only 9 sdX shows. I know when they show up by reading the serial number from the Storage → Disk list on TrueNas GUI.

Bios doesnt show any drive, only the SSD I’m booting TrueNAS from. I’m gonna double check it once I come back home.

Thank you both for your time :slight_smile:

@alexaldin It shows that you are a Sys Admin.

As for the troubleshooting, ensure to track the drives also by the serial number. I’m sure you already know that the Device ID can and will change and are not tied to a specific drive. I would also track the Device IDs as well, just in-case there is something strange going on there.

If you made a change 2 weeks ago, that could affect it, but if you are powering down every day/night, then I would expect the issue to show up right away.

If you haven’t done so already, you might try this:

  1. Boot from a Ubuntu Live CD. Can you see all the drives?
  2. Do a Clean Install of TrueNAS SCALE, 25.04.1 is fine, but do not restore the config file. Leave it untouched. Can you see all the drives now?

If you can see the drives under one of these conditions, then the hardware (HBA) “should” be good.

Something else, can you move your HBA to another slot? I didn’t look at your hardware to see what it is capable of, but if you can and haven’t already don so, it is worth a try.

Best of luck.

1 Like