Please save my life!

Homologue · November 12, 2024, 4:09pm

As you see you see at the below,this is my big problem. Can i solve it? How?
Thx for any help.

root@freenas:~ # zpool import
pool: hmg_munka
id: 17007855792929574562
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
config:

    hmg_munka                                       UNAVAIL  insufficient replicas
      raidz1-0                                      UNAVAIL  insufficient replicas
        gptid/c09fa596-69b0-11e7-938a-0cc47acc4dca  ONLINE
        12663129904426395595                        UNAVAIL  cannot open
        gptid/43bb1d53-196f-11e7-bc3c-0cc47acc4dca  ONLINE
        15132178709705660035                        UNAVAIL  cannot open
      raidz1-2                                      ONLINE
        gptid/3794a239-1bfb-11e7-b78a-0cc47acc4dca  ONLINE
        gptid/37f7568f-1bfb-11e7-b78a-0cc47acc4dca  ONLINE
        gptid/6123dd01-166e-11ea-8e83-0cc47acc4dca  ONLINE
        gptid/2b2005da-1bef-11ea-a407-0cc47acc4dca  ONLINE
    logs
      mirror-1                                      ONLINE
        gptid/d74c6a07-e7e7-11e6-9914-0cc47acc4dca  ONLINE
        gptid/d7655acf-e7e7-11e6-9914-0cc47acc4dca  ONLINE

HoneyBadger · November 12, 2024, 4:11pm

Hello @Homologue

You have a raidz1 with two missing devices - until you’re able to reattach at least one of them, there is little we can do.

Please describe your hardware configuration as best as possible - CPU, motherboard, storage controller, drive types - as well as the events that immediately precipitated this problem - power outage or electrical surge, unexpected system shutdown, physical environment conditions (overheat) and we can attempt to determine the issue.

Homologue · November 12, 2024, 4:33pm

Cpu: Xeon e3 1225 v5
Mobo: supermicro x10sll-s-0
Ram:32gib ecc
Raid:adaptec asr7805
And This is in a supermicro server house.

Homologue · November 12, 2024, 4:34pm

Sorry, i forgot
Freenas 11.3 u5 software

RetroG · November 12, 2024, 4:46pm

do you see all the disks you expect in on the disks page?

found on storage → disks

Homologue · November 12, 2024, 4:54pm

Yes.
da0-5 are the storage disks.
Sorry, cannot attach image, don’t no why.

Homologue · November 12, 2024, 5:04pm

Maybe I wrote it wrong. I only see the “online” ones.

Davvo · November 12, 2024, 5:15pm

Output of camcontrol devlist?

Homologue · November 12, 2024, 5:19pm

Not really understand.

Homologue · November 12, 2024, 5:33pm

Understood.
How can i copy from the shell?
Sorry, i’m really noob for it.

RetroG · November 12, 2024, 5:43pm

use triple backticks ie ``` at the beginning and end of the output when pasting into the forums.

Homologue · November 12, 2024, 5:49pm

joeschmuck · November 12, 2024, 6:00pm

If you did not remove any drives, then start to look back, did you disturb the server at all? Move it, open it, change something? Two drives disappearing at the same time is not a normal failure. Or has one drive been failed for a long time and ignored and now a second drive failed? I am leaning towards the second possibility here since you are running FreeNAS 11.3. This sounds like someone setup a server and left it alone until it flat out failed. If this is the situation, please let us know so we can provide the best advice to you.

Once you have posted the requested data from above, we will have something more to examine.

We are looking for the existence of 10 drives, which includes the missing two drives.

Once we know the serial numbers of the drives listed, you can physically locate the two drives not being recognized. Maybe it was a loose power connector? It is speculation at this point.

You have a lot of good people assisting you so listen to what they are saying. if you do not understand, ask. And per Joes Rules, do not assume anything. Do not say something like “The thingy turned red” and assume we know what you are trying to convey.

One last thing… I doubt you will find many, if any, FreeNAS 11 running users here. I’m not saying you should change the version you are running, however I just wanted to be clear that most people will not have a GUI that looks like yours.

Homologue · November 12, 2024, 6:14pm

For some time, the system indicated an error, and I saw 7 hdds instead of 8 in the storage pool. Therefore, after identifying the missing disk, I shut down the system and then put a new hdd in place of the missing disk. After that, it started with the error outlined above.
Sorry if I’m in the wrong place, but this is all I could find.

joeschmuck · November 12, 2024, 6:26pm

First of all, you are not in the wrong place.

And I appreciate you telling us how the problem cam about, it matters so we can help you out.

Do you still have the original failed drive? This may be needed, it is possible you replaced the wrong drive? I will hope that is what happened as it likely leads to a fairly fast recovery.

Please post the output of glabel status

This output allows us to cross reference the gptid to drive name.

Once you have this, also perform the following command for each Component in the last column of the previous command and then post the output. This can be a lengthy output so take your time.

the next command is smartctl -a /dev/XYZ where XYZ equals the letters and number before “p” in the component list. Example: the list below would use ‘/dev/da0’ and ‘/dev/ada2’, you drop the p and anything after.

                                      Name  Status  Components
gptid/74e01493-127b-11eb-85e9-000c296fd555     N/A  da0p1
gptid/66431f30-d52f-11e7-ab84-0cc47ab37c5a     N/A  ada2p2

So, someone may be asking why am I asking for a smartctl -a vice smartctl -x and the reason is, I just want to identify the failed drives right now. Once the drives have been identified, then you can try to physically locate the failing drives using the serial number for reference. The serial number is the ONLY constant here, do not think drive “ada0” is always the same physical drive, it can change during a reboot. The serial number is what to use.

With that data we can get the serial numbers of the drives working at a minimum.

Homologue · November 12, 2024, 6:46pm

How can I copy the result from the shell? ctrl+ins incorrectly(falls apart, just copies a few lines)

dan · November 12, 2024, 6:48pm

Don’t use the shell. Enable SSH and connect to the server that way. Copy/paste are among the lesser benefits of this course of action, but are still on the list.

Homologue · November 12, 2024, 7:42pm

I’m done. I hope I did it right. I usually take pictures of houses, I don’t do magic on the command line.
Thx for ssh hint!
ada0.txt (4.6 KB)
ada1.txt (4.6 KB)
ada2.txt (4.6 KB)
ada3.txt (4.6 KB)
da0.txt (5.4 KB)
da1.txt (5.3 KB)
da2.txt (10.7 KB)
da3.txt (5.7 KB)
da5.txt (5.3 KB)
glabel.txt (1.1 KB)
da4_.txt (5.6 KB)

joeschmuck · November 12, 2024, 8:43pm

Holy crap! 91194 hours on drive ada0. And not a single drive having a SMART test don’t on them.

Your boot drives look to be ada0 and ada1.
Your log drives are ada2 and ada3.
Your raidz1-1 drives are da2, da3 (missing two drives).
Your raidz1-2 drives are da0, da1, da4m and da5 (all accounted for)

Out of curiosity, does this produce a result other than an error? If it does, this data will help, I hope. midclt call disk.query

None of the drive data presented are from new drives, they all have serious hours on them. I think you need to describe in detail exactly what you did. Leave no detail out. Think of it as if you need to teach us how you did it. Don’t let us assume.

Do you have the drives you removed? I think you are going to need to reinstall the failed drives to get back to your previous state. If both drives failed before you replaced the first failed drive, then because you have a raidz1 configuration and lost two drives in the same vdev, the data is more than likely gone. Some recovery service might get some of the data back however that is big money.

I wish i had better news, hopefully someone else will have some better news for you. I certainly do not know it all and learn every day.

Cheers

neofusion · November 12, 2024, 8:57pm

At this stage I hope you actually only had a single failed drive and that you then pulled the wrong drive (possibly because devices names like ada1, sda1 and da1 can change around every boot).

If that were to be true, you have a pulled good drive that you can put back and get back to a state with “only” a single bad drive.