Won't boot past 24.04.0

I have an otherwise solid TrueNAS rig that I’ve been using for a couple years (installed at version 22.02.4). It has been solid and I have it configured in a way I like.

I never had any problem upgrading until after 24.04.0. Ever since that version, any upgrade results in the system freezing at “Loading initial ramdisk …”

I can continue to boot 24.04.0, and have newer versions “installed” but never booted. After an upgrade, I just manually boot (via grub) 24.04.0 and change the default boot environment back to 24.04.0. I currently have boot environments for 24.04.1.1 and 24.10.0.2 “installed” but they won’t boot. I really want to get up to ElectricEeel and start using the docker apps, but can’t get past this.

I would very, very much prefer to not reload and import settings, VMs and apps if I don’t have to. I can’t find any other forum topics with this same symptom and I don’t have any idea what could be the issue here, although I’m relatively proficient in Linux and related technoligies.

Any troubleshooting direction would be very much appreciated.

Lets start with posting the hardware configuration?

There are some ideas in this thread.

I wouldn’t worry too much about the effort required to reinstall TrueNAS.

  1. Ensure you have a backup of your system configuration.
  2. Download 24.04 version of Scale (if you have apps I think you would almost certainly be better off going to Dragonfish first and then upgrading to EE).
  3. Install 24.04, import the system configuration file.

You should be back to a fully working system without needing to reload any settings, VMs or apps. :smiley:

I’m not too, too worried about it because I’ve done it before on test systems and it has worked out. I have an additional concern, though, that - say this works, but then future upgrades display the same behavior - I would rather solve the root cause, or at least understand it while I have this opportunity in front of me, plus, some others might have the same issue and my troubleshooting might either expose a bug that can be fixed, to the benefit of all, or at least be able to post a workaround here or successful remediation that could be beneficial. If it comes down to it, though, I will go this route, I’m just not quite there yet.

This is very interesting - I noticed that the first reply I saw in there noted that there were multiple GPUs on an affected user. I use the on-board (Intel) GPU for display, but also have an NVIDIA card in here that I pass through to a container for some computational stuff.

What HW config elements would be helpful? (short of posting an “lshw” or “lspci” output here - or is that a good idea?)

I’m kind of honed in on that second graphic interface being a target of interest at this point…

You are probably right. You are on your own until you post detailed hardware specs.

1 Like

Looks like lshw is not included in TrueNAS. Here is the output from lspci, which should be pretty much everything pertinent:

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d4)
00:1c.3 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 (rev d4)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Q87 Express LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2)
01:00.1 Audio device: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] (rev a1)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8070 based Ethernet Controller (rev 16)

CPU is “Intel(R) Core™ i5-4590 CPU @ 3.30GHz”

… there’s a lot of bug fixes between 24.4.0 and 24.10.1

I’d suggest removing NVIDIA… going through te upgrade path and then adding Nvidia again.

Consider disabling any hardware not essential to the system in the BIOS.

OK - I tried removing the Nvidia card and disabled everything in the BIOS (USB ports even) and it didn’t help.

However, when I enabled Legacy boot (Secure boot was NEVER enabled, so didn’t need to try disabling that) it did something different. It still would not boot, but it did more, not necessarily better…

Loading Linux 6.6.44-production+truenas ...
Loading initial ramdisk ...
error: checksum verification failed.

Press any key to continue...

However, within 5 seconds it continued on its own and tried to start the system, eventually failing at:

[    0.736840] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---

Note: I removed the extraneous hardware then attempted a new upgrade from 24.04.0 to 24.10.1 which led to this, the same outcome as previous upgrade attempts.

Also, turning Legacy Boot back off returned to the old behavior of not showing any of the iformation. Whether legacy boot is kept off or on, the system still succesfully boots into 24.04.0.

So, it looks like it isn’t able to load the initramfs, from what I can tell. This seems like some type of issue with the configuration of the bootloader or an inability for the bootloader to access the storage containing the initramfs or kernel.

TrueNAS Scale seems to do things a little differently than normal Debian on this side of things, so I’m not exactly sure how to troubleshoot the issue. Anyone have any ideas on direction here?

UPDATE: I want to mount the boot-pool to take a look in it and see if the kernels and all exist, but when I try to mount it I get this:

#  mount /dev/disk/by-label/boot-pool /tmp/tmpmnt
mount: /tmp/tmpmnt: unknown filesystem type 'zfs_member'.
       dmesg(1) may have more information after failed mount system call.

Maybe this is part of the problem…?

UPDATE2: I was able to get another boot environment mounted using:

#  mount -t zfs boot-pool/ROOT/24.10.1 /tmp/tmpmnt

Looking in /tmp/tmpmnt/boot/, I can see that the necessary boot files exist:

[/tmp/tmpmnt/boot]# ls -la
total 186190
drwxr-xr-x  3 root root       12 Dec 24 16:56 .
drwxr-xr-x 21 root root       29 Dec 16 15:58 ..
-rw-r--r--  1 root root  6294129 Dec 16 13:57 System.map-6.6.44-debug+truenas
-rw-r--r--  1 root root  6242684 Dec 16 14:38 System.map-6.6.44-production+truenas
-rw-r--r--  1 root root   254983 Dec 16 13:57 config-6.6.44-debug+truenas
-rw-r--r--  1 root root   254579 Dec 16 14:38 config-6.6.44-production+truenas
drwxr-xr-x  2 root root        2 Dec 16 16:00 grub
-rw-r--r--  1 root root       15 Dec 24 16:55 initramfs_config.json
-rw-r--r--  1 root root 76069664 Dec 24 16:56 initrd.img-6.6.44-debug+truenas
-rw-r--r--  1 root root 74549362 Dec 24 16:55 initrd.img-6.6.44-production+truenas
-rw-r--r--  1 root root  9183744 Dec 16 13:57 vmlinuz-6.6.44-debug+truenas
-rw-r--r--  1 root root  9323008 Dec 16 14:38 vmlinuz-6.6.44-production+truenas

I did notice that TrueNAS is saying that my boot-pool is degraded, which is kind of weird because it is a single disk (SSD). Maybe I need to troubleshoot that more, but still doesn’t explain why it will boot one image but nothing newer…

I was able to video the failing boot when Legacy was enabled. I see an error that I’m sure is related, although I don’t believe it describes the root cause, hopefully it can help focus in on it.

I believe it reads (transcribed from a frame in a video because it is only visible for a split second):

[    0.016740] Unknown kernel command line parameters "BOOT_IMAGE=/ROOT/24.10.1@boot/vmlinuz-6.6.44-production+truenas zfsforce", will be passed to user space.

Later on there is something like “Please append a correct “root=” boot op…” followed by the initial “Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)”

It seems like grub is either somehow messing up the boot parameters (seems unlikely, but the error seems to suggest this) or the filesystem is not available (seems more likely - how to troubleshoot?)

1 Like