Scale Electric Eel Crashing

I have been running into an issue on both 24.10.0.2 and 24.10.1, where the whole system locks up and crashes. Cant SSH in, cant access the web gui, requires a forced reboot by a power cut in order to get any control to the system. Does anyone have an idea? I have provided snippets of the syslogs and a picture from the direct output from the machine. It’s running a Ryzen 7 1700, 64gb ddr4, GTX 1660 for output (but I am not using any gpu acceleration), I’m using a USB drive enclosure for my storage (4x4tb iron wolf pros, running in a RaidZ2) and I have mirrored 1tb sata ssds for the app pool storage. Primarily this serves as a minecraft and wireguard server, I’ve been tinkering with jellyfin but the crashing persists with jellyfin stopped. There seems to be no rhyme or reason as to why the crashing occurs. Apologizes for the awful photo of the physical output, it was the best I could do in the moment. Any help would be appreciated

Will this media upload?



Nothing in the screen shots that say anything definitive. The last one might be a full system panic, I am not sure.

But aside from the dump, one earlier message is concerning:
Device /dev/sda [SAT], SMART Usage Attribute 194: Temperature_Celcius changed from 73 to 74

That temperature seems rather too high.

So the SMART data is only reporting a high of 32 C on that specific pool, I’ll run another SMART test and monitor those specific drives. Thats one of the mirrored SSDs, maybe I’ll need to look into better airflow in their location though to see if thats the issue. I’m just curious as to why full system panics are happening. The syslogs aren’t showing anything definitive and I cant seem to find any full error traces anywhere

That is one very big screen for a NAS!

Have you tried to roll back to 24.10.0.2? If not, give it a try. This will give you confidence your system still works.

Then you can ensure you export and save all you configuration data, then try the upgrade again. If you have the same issues then you have a few options:

  1. Run Memtest86+
  2. Run a CPU stress test
  3. If these pass, step 4…
  4. Using a USB drive as a boot drive (for testing only), Install 24.10.1 from an ISO, Fresh Install. Then restore your config file. The system should reboot.

Did that fix it?

Sometimes an upgrade fails for a reason I’m not aware of, and an ISO install gets you past it. If that works then give it a few days to ensure it is stable. If it is, you can then install to your normal boot drive. If the problem comes back, suspect the boot drive.

The issue persists on both 24.10.0.2 and 24.10.1, I’ll give the rest of this a try though

There have been numerous posts by people running 24.10.X on Ryzen CPUs specifically complaining about “random reboots when idle”, a key symptom of an older Ryzen power issue.

When going idle CPU/Motherboard incorrectly communicates to the PSU to lower the provided level of power too far. The issue occurs when the power drops so low that the CPU can’t wake up again.

Update the BIOS and set Power Supply Idle Control to Typical. This is the best option.
If you can’t update the BIOS, try just disabling C6-states.

There’s a fair chance the above will resolve some of the crashes.
Having said that, running memtest and such like the others have suggested, is also a great universal tip, especially due to the kernel panics.

2 Likes

I too was experiencing consistent crashing after upgrading to Electric Eel. Yes, I’m running a Ryzen: Ryzen 7 1700X with 32GB of RAM. I could go about 32-36 hours before a crash, consistently. I rolled back to Dragonfish-24.04.2.3 and so far the system has been active for almost 4 days without a crash. If it was a simple case of a Ryzen Power Issue, wouldn’t that exist on Dragonfish as well?

im gonna try and update the bios, that was a thought because of how old the board is, I will update after some time.

I dont think so if the idle power draw in electric eel got reduced in comparison to dragonfish

Unrelated, that is not recommended and often a source of trouble. I guess it does show the drives individually, otherwise would not be able to create the raidz2, or do you attach four individual drives?

Yes, we’ve seen a sharp rise of people with older Ryzens posting in the forum having this exact issue after upgrading to 24.10.

Presumably the newer version is better at letting the CPU sleep (newer kernel, no more resource intensive Kubernetes, etc), which is exactly where the instability starts rearing it’s ugly head.

the drive enclosure? It has given me no issues, and its basically a sata backplane that communicates to the system over usb. I had no problems when making the raidz2

1 Like

this was my thought after the sleep state issue was brought to my attention. I’ve updated the bios literally 5 mins ago, going to give this a couple of days as a test, if it crashes again I have a newer cpu i can try (3rd gen ryzen, have been lazy with upgrading it)

The 3700x I used in my Windows based gaming system had these issues as well sadly. Not sure when they/if they fully fixed it.

I’m using a 3700x in my truenas box and haven’t had any issues for about 2 years…

My BIOS is at the latest. I hope I’m not forced to update hardware just to use TrueNAS. Isn’t part of the appeal of TrueNAS being able to “recycle” older hardware to create your own NAS? Also, the OS is the same, but TrueNAS is the difference. Is ElectricEel deploying a new Linux Kernel in the process? Sorry, but I’m not a coder/developer, and tend to ignore that stuff unless/until it’s a problem.

Good, but did you also set the setting I mentioned to Typical?
I mentioned updating the BIOS so that you would have option available to you.

For older hardware that never received BIOS updates giving you the new option, you instead turn C6-states off.

In a server chassis the temperatures are significantly reduced. This is three of six drives in a DL380eG8 Proliant Server. You can’t compete with the price, and CPU’s are cheap by comparison… dual e5-24?? for anywhere between $10 and $50. I use the low power e5-2450L and have lots of cores running 1.8 GHz and fast bus speed…

Ignore the advice at your peril, but it is in your favor that the usb controller in the enclosure will fail in a way that probably won’t kill the pool.

I don’t see your exact hardware, but my experience with this from 8 years ago is that a busy pool has about 3-6 months of running time before it burns out the usb controller.

That may already be happening in part, which could be contributing to the issues you’re seeing. (I do think it’s more likely the CF states thing or BIOS though)

If you can run without that usb enclosure pool for a few days, that might be a test to try if the other stuff doesn’t solve it.

Best of luck with the usb enclosure. (And ideally finding an alternative)

I am going to agree with @sretalla that using a USB enclosure is going to cause nothing but grief and aggravation. I had a newer mini PC (AMD Ryzen 7530U) with a USB C port that I tried for months to use with multiple USB drive enclosures. All of them resulted in some type of lock-up or actual volume corruption. Just buy a new chassis with a SATA/SAS backplane that can be either directly connected to the motherboard or a SAS HBA. Lookup Jonsbo, they have a great variety of options. You will stop wasting time on trying to diagnose the randomness of USB drive enclosures. Just say no to USB drive enclosures, they are not worth the headache.

1 Like