Dragonfish 24.04.1.1 GUI goes unresponsive

I have not and yes while it’s old hardware the SSD DOM drives were fairly new and have performed without issue up until moving to the release version of Dragonfish. My other system running CORE is identical and running without issue. Previously this system running the RC of Dragonfish ran without issue. Only started having issues after moving to the Release version as with other folks.

Checked that as well. Nothing on the console but the normal number menu of options.

I’m talking about doing things like getting shell and checking if it is responsive, investigating for networking issues, etc.

I have actually not tried to do the shell from the console just remotely via SSH or the web GUI. From what I’ve seen the network stack is up and working as I can ping the system from others and get a response but when I try and login it just sits. Also when going to the web GUI it just spins so the connection is there as it doesn’t say not responding, again it just spins. I have an SSH session going with HTOP on the advice of one of the admins here and have opened a debug ticket with my debug file.

locked up again this weekend. i checked the shell from the console and it was unresponsive, i.e. i selected the shell option from the menu and it just sat there and would never give me a prompt to do anything. the htop i was running in an SSH session appears to have died and kicked me out but would not give me a prompt either, just sits there.

image

managed to grab a copy of my htop before it crashed again. looks like the ARC is filing up.

I replied elsewhere, but that screenshot shows a LOT of free memory still. Your previous message shows that you can’t even launch htop when it goes catatonic. I’m looking for something pointing to 100% memory utilization. I am a tad concerned about the Sata DOM you are using. Those are notorious for failures, and it could be that the boot device is also stopping IO for completely other reasons which would better fit the symptoms described so far.

If it was the Sata DOM wouldn’t other services lock up besides just the GUI and SSH? All the other services on the box are staying running and are not impacted.

No. Services already up and running in memory can stay running for quite some time. The symptom you usually see is when you SSH in and try to run some commands not already in memory cache it tends to hang. Middleware / WebUI are usually first victims because they load specific things on-demand and periodically, so if boot goes catatonic they will halt.

1 Like

@kris thanks for this. it helps. i’ve got a brand new Sata DOM that i can give a shot but if i do it will be a brand new fresh install and might skew the debugging if it goes away and this is tied to the upgrade from CORE to Dragonfish RC to Dragonfish Release.

Try adding one sata device as a mirror of the other. This can be done in the GUI.

Then if you want you can remove the original device. Or not.

Wow… so what is iXsystems using these days as a boot drive in their Minis?

My MiniXL shipped with a 16GB SATADOM drive, IIRC. I have since upgraded to a mirrored pool of 64GB SM branded SATADOMs, likely twins to the ones used by @Spunky17

1 Like

I’d be velcroing a proper SSD somewhere into the case

Also, you may find this interesting:

https://ixsystems.atlassian.net/issues/NAS-100168

1 Like

That’s a bit discouraging, especially since neither CORE nor SCALE GUI allow anything beyond a 2-way mirror. The CLI allows more, IIRC.

Not the middlewared on scale when I tried it.

But unplug one. Then replace the removed device. And now you have a physical backup :wink:

1 Like

We’ve moved to single a single M2 devices for boot these days. M2 reliability generally is really good. SATA DOMS work, but there tends to be a lot of quality issues with them just like with USB sticks.

That’s what I wanted to do with a USB stick in a 3-way mirror - add it, have it resilver, then remove & store the stick offsite.

Ha, love it. Post a problem, asked to open a bug report, get told it’s a hardware issue and close my bug report with no notion of how it’s a hardware issue. Rebooted system back to the Dragonfish RC version and has been running just fine since. Yeah right hardware problem… :roll_eyes:

@Spunky17 Can you DM me the ticket? The one I’m monitoring here is still open…

Just sent it over. Shows closed saying it’s a hardware issue but doesn’t say how they got to that conclusion.