Hello Forum,
this is my first post here so hopefully my post I am in the right category.
My Problem:
for some reason my (first) TrueNas Scale nas crashes every 0.27 days ±10min (very regular)
My system:
MINIS FORUM 790S7
AMD Ryzen 9 7940HX with Radeon Graphics
2x2Tb m.2 nvme SSDs
1 external USB boot SSD (I know this is not optimal, but have no choice since I have not enough m.2 / SATA slots)
Things I tried to fix it:
reseat RAM (and run a short RAM test)
disable Apps
disable all overclocking
Some observations
I get the " ‘boot-pool’ is consuming USB devices ‘sda’ which is not recommended." error I think because of my external boot drive
when watching the crash event, RAM is normal, CPU usage is normal
the Web-UI becomes shortly before instable and once I have seen the error " ‘boot-pool’ is consuming USB devices ‘sda’ which is not recommended." reappearing after dismissing it after boot
after 10min the “freeze” solves itself and the machine runs again with no issues for 0.27 days. You can shorten the 10 minutes by hard shuttig it down and booting again
The screen is blank in the “freeze” period
These are the error Logs connected to one of the events:
It is in the GUI I think under reports. Sorry on the road so can’t give you an exact answer. But if the system has 30GB free, you are correct, it should not be a problem however it is still worth looking at to rule it out as a cause.
You have an interesting problem. Hope it gets solved quickly.
I did install coral drivers for frigate, but the issue did occur before that too. I will redo a clean install and send a new / updated ticket when the next crahs happens. is that ok?
To me, this appears to be udev-related. Based on some experiences I’ve had with setting up zswap and issues with BIOS-managed power modes, you may need to ensure that you have the appropriate udev rules for your device and double-check that your USB ports or the USB drive itself aren’t entering a low power mode. For udev, I found that I needed rules that ensured certain drives were configured properly in udev with the right filesystem type and ignored by udev (in the case of zram and zswap, anyway) by including:
as part of the devices’ rules. You will also need to have the correct ENV{ID_PART_ENTRY_TYPE} set as well. I don’t know what your particular udev settings need to be, but maybe this will help point you in the right direction if the device isn’t being detected properly by udev.
Also note that if the drive appears to have “spun down” for any reason, especially if TrueNAS middleware didn’t trigger it, then TrueNAS will offline it. If the kernel doesn’t panic, it may come back online again at some point, but that behavior seems like it would be highly undefined.
Ensure you’re delivering full power to your USB port at all times, that your boot-pool isn’t configured to use any power saving modes in the TrueNAS middleware or BIOS, and that the USB device itself isn’t designed to be a low-power device or to spin itself down when not in use. I can’t guarantee that will fix your problem, but the measures I just mentioned (other than changes to udev) certainly can’t hurt.
Hi thanks for the response. This seems to be very interesting as my drive does come back after some time and the crashing is very repeatable which looks like the power to the drive is cut off at some point(but the led on the drive actually still is on when the freeze happens)
I am actually a complete newcomer to such nas systems and drive / usb power modes. Could you (or someone else) help me where to begin?
I searched the bios and the only parameter I could find was a ERP which was set to “disabled”.
How can I set udev rules? I cant find anything in the truenas search. The boot device is just a usb to nvme adapter with a ssd mounted inside it.
I have just reinstalled a clean version and will wait to upload another ticket when the system will again crash in 0.27h
I’m sorry; I’m not a udev expert. If you look at the kernel archives or various Linux how-tos you might find what you need. If not, Linux & Unix Stack Exchange or ServerFault might be useful places to ask about specific udev settings.
In all honestly, though, I suspect it’s the spin-down. You can control TrueNAS managed spin-down in the TrueNAS web client for that disk (make sure it’s disabled). As for the BIOS settings, you would need to look at the manual for your particular BIOS. For me, they were in multiple places like the PCIe and USB settings, power management settings, port settings, and the AHCI and similar settings. In other words, they were all over the place; my system happened to come with sensible defaults, but yours may not.
There may also be settings for the kernel or in the sys and proc filesystems that will allow you to disable low power modes for your device. You’d have to Google for them; I know there are some, because I’ve used them before, but if find /sys -iname "*power*" or similar doesn’t turn up something appropriate then I wouldn’t know specifically where to point you.
If you have a computer that doesn’t let you access USB or link-state power–and there certainly may be some–then you will probably need to buy a powered enclosure that isn’t relying on the power from the USB port itself, or buy a different system that has more ports or gives you BIOS or OS access to USB power states. I know that’s not what you want to hear, but if you can’t find a way to control your USB power then you mighht not have a choice.
Alternatively, you might consider rebuilding your array and assigning one of the spinning drives to be your boot drive. You’d have to dedicate a drive for it, but it could be a good stopgap solution using the hardware you already have if you don’t have any other options.
Hello I after some research tried to diable autosuspend for the usbs as described here:
But this wasn’t it either. The fresh install did crash anyways as expected after 0.27 days. I added a second debug report to the ticket. Is that ok @awalkerix ?
Hello,
I tried troubleshooting with booting from a USB (same issue) and bought a PCIE to nvme adapter which seemed to work but about one hour after the 0.27days the system crashed but this time fully with these logs:
Then after trying to reboot it doesn’t want to boot and when trying to reinstall truenas the following error is shown: (Fixed after formatting the SSD)
This error is telling you that your GPT partition tables are corrupted in an unfixable way. You probably need to reformat the stick and reinstall. You may also need to enable legacy USB support in your BIOS and boot from an MBR instead of GPT if TrueNAS will even let you do that; the default seems to be using the EFI partition of a GPT table.
You’ve been wrestling with this for a while. No one else can really help you directly because it’s an unsupported configuration on hardware the rest of us can’t replicate for you. Not all computers, USB ports, BIOS options, or enclosures will work. You can get a decent mini PC even before Cyber Monday for under US $137 that will likely work if you have an internal SSD for your boot pool and a better enclosure or better cables.
There’s absolutely nothing wrong with dancing on the edge with your hardware or configuration, but doing that risks the sort of problems you’re facing. Ultimately, your current hardware–especially your need to rely on an external USB stick that may or may not be fast enough or high quality enough–is going to keep bogging you down. It’s time to try some different hardware.
If you really want to use your current hardware anyway or to use an alternative that requires booting from USB then you may want to at least try UnRAID to see if it works with your existing hardware. I actually switched to TrueNAS from UnRAID for similar reasons, but you may have the opposite use case.
UnRAID works differently, and has different performance goals and use cases. It’s not better or worse in an objective sense; it’s just different. Assuming it works with your hardware and you don’t need the performance of striping across disks (UnRAID uses a file mover to move disks from cache to a single disk in your array that’s optionally protected by parity then the only real downside of UnRAID is that there’s no community edition. It’s got a lengthy free trial, but all versions require a paid license. The license costs aren’t high, but it’s definitely non-free in both a FOSS sense and a free-as-in-beer sense. However, it may be a cheaper alternative than replacing your components if you need or want to continue working within the constraints of your current hardware.