2/6 Drives cause TrueNas Core to Bootloop

Not sure why but when I rebuilt my NAS, it stopped booting and started bootlooping. It would go through the whole bootup process and stop at the same part every time.

I did some basic troubleshooting to see if it was a hardware thing like take out and put back in CPU, RAM, HDDs etc and it seems like it is two drives in particular because when I don’t connect them it will boot just fine and I can see it on the Web UI. I talked to a friend about it and they said that there might be a problem with the drives. Not sure how to begin troubleshooting it or running diagnostics. Any way I can get the data out of the drives onto another one or get them into working shape again?

before you ask, the drives have been out of warranty for a while now (sad face)

Would anyone be able to help me with this by providing some insight or link some guides on how to do this? I have a learning disability so please be patient with me. I’m a total beginner with TrueNas.

Thank you for reading.

Specs:

Intel i5-3470
2 x 8GB Non-ECC RAM
SZMZ B75-MS MINI ITX
6 x 4TB Western Digital Red
LSI SAS9211-8I (IT Mode)
1 x 120GB SSD

I am on the latest version of TrueNas Core.

Your hardware is pretty old if I am seeing the correct info for the Intel Processor and the motherboard.

I am guessing the SSD is your boot pool / boot drive. How are all your drives connected? Are all six 4TB drives connected to your LSI SAS9211?

What is your pool setup? All six in one VDEV with Raid-Z1? Mirrored sets?

We will need the model number of all your hard drives. If they are all the same, just post once and note. We are looking to make sure your WD Red drives are CMR and not SMR for the way they record data.

Latest version of Core, is it TrueNAS-13.0-U6.2? or is it something slightly different? TrueNAS-13.0-U6.3 was just released Thursday

ZFS Primer in case you need to look up terms I used above.

Your hardware is pretty old if I am seeing the correct info for the Intel Processor and the motherboard.

Yes, but I was told that it should be sufficient. Was I misinformed?

I am guessing the SSD is your boot pool / boot drive. How are all your drives connected? Are all six 4TB drives connected to your LSI SAS9211?

Yes, the SSD is my boot drive. The other HDDs are all connected to the HBA card.

What is your pool setup? All six in one VDEV with Raid-Z1? Mirrored sets?

I’m pretty sure they were all just mirrored. I had them paired up.
[I just checked by booting up a couple of them and they are in fact mirrored.]

We will need the model number of all your hard drives. If they are all the same, just post once and note. We are looking to make sure your WD Red drives are CMR and not SMR for the way they record data.

3 are WD40EFAX and 3 are WD40EFRX.

Latest version of Core, is it TrueNAS-13.0-U6.2? or is it something slightly different? TrueNAS-13.0-U6.3 was just released Thursday

You’re right. It was U6.2. I thought it was U6.3 because of how recently I updated.

I will take a look at both links you sent me.

Thank you so much for the reply.

The EFAX, I think, are problem drives with ZFS. Double check their type.

I am sure someone will want to see your pool status and layout

Please go to the shell window in the GUI and run
zpool status

copy and paste the results back in a reply using Preformatted text (</> icon above)
CTRL + INS (control + insert) should allow you to copy from that shell window, or it will give you guidance when you attempt to copy or paste to the window.

PDF confirms as SMR
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-red-hdd/product-brief-western-digital-wd-red-hdd.pdf

The EFAX, I think, are problem drives with ZFS. Double check their type.

Yeah I did a quick Google and yeah they are the SMR drives. :frowning:

Last login: Sat Nov 23 21:13:16 on pts/1
FreeBSD 13.1-RELEASE-p9 n245431-b8ec9bde091 TRUENAS

    TrueNAS (c) 2009-2024, iXsystems, Inc.
    All rights reserved.
    TrueNAS code is released under the modified BSD license with some
    files copyrighted by (c) iXsystems, Inc.

    For more information, documentation, help or support, go here:
    http://truenas.com

Welcome to TrueNAS

Warning: the supported mechanisms for making configuration changes
are the TrueNAS WebUI and API exclusively. ALL OTHERS ARE
NOT SUPPORTED AND WILL RESULT IN UNDEFINED BEHAVIOR AND MAY
RESULT IN SYSTEM FAILURE.

root@lilNasX[~]# zpool status
pool: OKAARA
state: ONLINE
scan: resilvered 6.09M in 00:00:01 with 0 errors on Thu Nov 21 21:46:24 2024
config:

    NAME                                            STATE     READ WRITE CKSUM
    OKAARA                                          ONLINE       0     0 0
      mirror-0                                      ONLINE       0     0 0
        gptid/1345882a-2a7b-11ed-942a-94de80684542  ONLINE       0     0 0
        gptid/1357b971-2a7b-11ed-942a-94de80684542  ONLINE       0     0 0

errors: No known data errors

pool: boot-pool
state: ONLINE
config:

    NAME

Here are the pools. What’s interesting is it will bootloop when drive number 5 is connected before booting up, but if I get the system up and running and then connect the 5th drive it will detect it and everything. Probably not the best idea but I was just very curious if it would work.

Did you three mirrored vdevs of 2 disks each and each vdev had its own pool?
OKAARA list listed above with two disks in the vdev.

I would get paper to write down this information so you can refer to it
you will want to write the info for the pool shown and the disks.
you want the model numbers of the two you are showing and their serial numbers.
we are trying to put together a list so we know which drive and all its information goes together.

I would boot up with just the two disks you have showing above and see what is showing in the GUI and what datasets and folders are associated with the OKAARA pool. This is your chance to get copies of the data or at least know what is saved where.

If I am guessing correctly on your pools and vdev layout, we may be able to get the data, going pool by pool. Try with the above first. You can take screenshots and trim them with the Snipping tool in MS Windows. If you need to save and label those pictures to help you remember info from the GUI.

At this point we are just trying to work with the setup you just showed. Don’t make any other changes. Post results and as questions if uncertain.

Did you three mirrored vdevs of 2 disks each and each vdev had its own pool?
OKAARA list listed above with two disks in the vdev.

I had 3 pools mirrored with two drives on each.

  1. MOGO (#XHA, #PJP) both disks working, wiped
  2. OKAARA (#H67, #2YK) both disks working, using for Plex
  3. QWARD (#7E9, #2FC) 2FC does not show up at all, 7E9 will show up in Disk Management but causes bootloop when I start TrueNAS with it connected.

Any way to find out why 7E9 is causing the bootloop but still shows up on Windows Disk Management and if I start the Web UI and login and look under ui/storage/disks then connect it? Is there any way to repair the disk and get all the data from 7E9 (QWARD) back?

I meant to have the hard drive model along with the serial number associations. I am not sure what the # with three characters is. It looks unique so that is good.

I wanted to see the drive models to figure out which were

If I understand correctly. The MOGO pool listed as 1, you wiped the disks or pool and we do not have to worry about that set but I still would like to have the model association for you.
I take it you have tried to boot just the QWARD pool disks as a set and each one individually with nothing but the boot disk attached?

Do you know what pool you were using for the system dataset?
This is an example from my Core machine. I only have one pool. Shows System Dataset Pool by Shay

I meant to have the hard drive model along with the serial number associations. I am not sure what the # with three characters is. It looks unique so that is good.

I wanted to see the drive models to figure out which were

The 3 digit numbers were the last 3 digits of the serial number. I took out the rest because whenever I see them for sale they always block out the serial number so I assume there’s something important about not sharing the serial number and so its not as long of a serial number to look at.

MOGO

    1. XHA - WD40EFAX
    1. PJP - WD40EFRX ←

OKAARA

    1. H67 - WD40EFAX
    1. 2YK - WD40EFRX ←

QWARD

    1. 7E9 - WD40EFAX (detected in disk management and storage/disks)
    1. 2FC - WD40EFAX (not detected at all and probably dead)

I take it you have tried to boot just the QWARD pool disks as a set and each one individually with nothing but the boot disk attached?

I did try just the QWARD pool disks as a set and they bootloop. I also tried just 7E9 and it bootloops as well. I tried again just before writing this just to confirm and yep bootloop.

Do you know what pool you were using for the system dataset?

Not sure what you mean by the system dataset? Do you mean the boot drive or is it something else?


Your screenshot for Pools above say MOGO (System Dataset Pool). You probably need those disks attached as that has that pool.

what did wiped mean when you posted this? Is the data and pool destroyed?

You know what? I don’t actually remember for sure if MOGO was wiped. iirc that should have been the one with the more important information so thanks for asking that question before I deleted it for good or something

Maybe it was QWARD that was the ones I wiped instead? Thing is when I try to Import Disk, neither MOGO, nor QWARD come up as options. It only shows OKAARA. Maybe I will look at one of the guides on importing tomorrow on my day off to see if I’m missing something.

EDIT:

I looked through a guide and I tried to Import Pool and this is what I got

I tried zpool import and this is what I got

The first disk in the mirror can’t be opened. You show both disks or member of that pool to be EFAX

I would suggest disconnecting the one that isn’t working and then try the import in the degraded state to see if you can see and access the data.
If you try to import both those disks and TrueNAS tries to resilver the data, I am afraid it will cause more problems with those being SMR

I removed 2FC and it still shows QWARD and that its degraded in the shell but at the same time doesn’t give me any options in the Import Disk screen.

Sorry for the late reply. I didn’t get an email notification about your reply and thought you forgot about me actually.

Lets try to see where you are currently. Please run these commands in the shell and post the results back using Preformatted text (ctrl+e), looks like </> on toolbar above where you type your replies / comments.

zpool status -v

zpool import
Last login: Fri Nov 29 23:29:43 on pts/1
FreeBSD 13.1-RELEASE-p9 n245431-b8ec9bde091 TRUENAS

        TrueNAS (c) 2009-2024, iXsystems, Inc.
        All rights reserved.
        TrueNAS code is released under the modified BSD license with some
        files copyrighted by (c) iXsystems, Inc.

        For more information, documentation, help or support, go here:
        http://truenas.com
Welcome to TrueNAS

Warning: the supported mechanisms for making configuration changes
are the TrueNAS WebUI and API exclusively. ALL OTHERS ARE
NOT SUPPORTED AND WILL RESULT IN UNDEFINED BEHAVIOR AND MAY
RESULT IN SYSTEM FAILURE.

root@lilNasX[~]# zpool status -v
  pool: boot-pool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors
root@lilNasX[~]# zpool import
   pool: QWARD
     id: 2389805668425095959
  state: DEGRADED
status: The pool was last accessed by another system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.gith

@HoneyBadger
Attaching other disk in mirror causes machine to crash. Looking for your expertise before attempting to import single disk for QWARD pool.