Initramfs checksum failed after updating from 25.04.1 to 25.04.2.1

dom · August 22, 2025, 3:30pm

Installation on bare metal, updated via web gui.
After it failed and i removed the broken image & reverted back to 25.04.1 i downloaded the manual update file, verified the checksum and tried again with the same result.

What kind of logs would help debug this and what other ways to upgrade are there?

Recently added a hailo8 pcie card, but 25.04.1 booted fine with it prior to the update and it comes up in lspci. Could added but unused hardware cause an issue like this?

winnielinnie · August 22, 2025, 3:40pm

The broken image? Do you mean the Boot Environment?

Run a full scrub on the boot-pool. This can be done through the System → Boot

If it completes with no errors, it doesn’t hurt to rule out bad RAM by running at least 1 full pass of memtest. Multiple passes, if possible.

If both the boot-pool scrub and memtest result in no errors, you can at least move on to more troubleshooting.

dom · August 23, 2025, 2:29pm

yes, boot environment, sorry.

I did run a scrub and removed a few older boot environments before trying again with the manually downloaded 25.04.2.1 update, good call on the memtest, going to run that now. Thank you for your help!

dom · August 23, 2025, 6:15pm

2 passes of memtest took 3h+, both finished with 0 errors.
ran an extended smart test on the boot drive as well, 0 errrors too.

The hardware is a morefine s500+ minipc, ryzen 5900hx with 64gb of gskill ddr4 3200 ram, a 128gb sata ssd boot drive (older samsung 830 that was previously used in a laptop but never saw huge amount of writes) and 2 seagate firecuda 2TB ssds for the storage pool.

One thing i did notice is that the broken boot environment for 25.04.2.1 was significantly larger than the older ones. Didn’t write down the number before deleting but it was 3.x gb while the olders were between 2.6 and 2.8.

truenas-fan · August 23, 2025, 6:39pm

This is not a bad memory issue.
It takes certain understanding but it’s not that hard to truly understand how a computer works.

winnielinnie · August 23, 2025, 6:44pm

That was unusually hostile

Did you miss this part?

Scrubbing a boot-pool takes minutes or less. No cost.

Running a memtest takes very little time. No cost.

It does not prevent going deeper later, which might be more involved than the little effort and time needed to run a quick test.

Easy way to eliminate a failing boot device or faulty RAM, which is nice to know for someone’s server anyways.

truenas-fan · August 23, 2025, 6:47pm

Enough space available in boot-pool?
It’s hard to tell with ZFS at times.

Installer should verify and error out properly, but just in case:
Just delete other previous boot environments to release space.

dom · August 23, 2025, 8:39pm

the sata ssd has 128GB and is exclusively used for os. As already mentioned, i did delete a few older boot environments before trying the manually downloaded update file.

What would next steps include? trying 25.04.2.0 ? Or removing the extra hardware and retrying 25.04.2.1? Is there a logfile or some other way to gain information about whats broken? booting it just says initramfs checksum failed, press any key.

Continuing leads to “Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)”

dom · August 24, 2025, 8:18am

found stats for the boot pool:

Stats/Settings
Boot Pool Condition: Degraded Check Alerts for more details.
Size: 118 GiB
Used: 10.78 GiB
Last Scrub Run: 2025-08-22 14:44:17

the degraded status is for 31 write errors on sda3, didn’t go away after scrub.

The alert says

Critical

Boot pool status is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected…

dom · August 24, 2025, 9:45am

noticed the message about unknown AMD-Vi option in my “screenshot” and went spelunking in the bios. found “AMD CBS” → “NBIO COMMON” → “IOMMU” and changed it from “[AUTO]” to “[ENABLED]”.

tried installing 25.04.2 after removing the new pcie card, same error occured for the regular boot option. However, the advanced boot option worked. (i never tried that with 25.04.2.1)

here’s lscpu which shows AMD-V but not AMD-Vi (which is a shame because it means i might not be able to pass through the pcie card to a vm/app?)

sudo lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
BIOS Vendor ID: Advanced Micro Devices, Inc.
Model name: AMD Ryzen 9 5900HX with Radeon Graphics
BIOS Model name: AMD Ryzen 9 5900HX with Radeon Graphics Unknown CPU @ 3.3GHz
BIOS CPU family: 107
CPU family: 25
Model: 80
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 46%
CPU max MHz: 4680.0000
CPU min MHz: 400.0000
BogoMIPS: 6587.68
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor sss
e3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ss
bd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd
cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization features:
Virtualization: AMD-V

but given this fail happens without the new card and i didn’t activate any passthrough options for it yet that shouldn’t be the cause of it.

looking at the grub config in /boot/grub/grub.cfg the only difference between the default and advanced option seems to be the image used (production vs debug)

default option fails:

echo ‘Loading Linux 6.12.15-production+truenas …’
linux /ROOT/25.04.2@/boot/vmlinuz-6.12.15-production+truenas root=ZFS=boot-pool/ROOT/25.04.2 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on zfsforce=1 nvme_core.multipath=N
echo ‘Loading initial ramdisk …’
initrd /ROOT/25.04.2@/boot/initrd.img-6.12.15-production+truenas

advanced option working:

echo ‘Loading Linux 6.12.15-debug+truenas …’
linux /ROOT/25.04.2@/boot/vmlinuz-6.12.15-debug+truenas root=ZFS=boot-pool/ROOT/25.04.2 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on zfsforce=1 nvme_core.multipath=N
echo ‘Loading initial ramdisk …’
initrd /ROOT/25.04.2@/boot/initrd.img-6.12.15-debug+truenas

dom · August 29, 2025, 11:13am

still happening with latest 25.04.2.3, is there a way to get the output of what generated the initrd image?

looking at the files themselves i find it a bit suspicious that the one thats failing (6.12.15-production) has been modified on jun 27, but all others on may 26

root@truenas:/home/admin# ls -la /boot/*
-rw-r–r-- 1 root root 7187285 May 26 14:52 /boot/System.map-6.12.15-debug+truenas
-rw-r–r-- 1 root root 7135076 May 26 14:54 /boot/System.map-6.12.15-production+truenas
-rw-r–r-- 1 root root 265130 May 26 14:52 /boot/config-6.12.15-debug+truenas
-rw-r–r-- 1 root root 264722 May 26 14:54 /boot/config-6.12.15-production+truenas
-rw-r–r-- 1 root root 15 Jun 27 11:51 /boot/initramfs_config.json
-rw-r–r-- 1 root root 79702016 May 26 16:45 /boot/initrd.img-6.12.15-debug+truenas
-rw-r–r-- 1 root root 78241680 Jun 27 11:52 /boot/initrd.img-6.12.15-production+truenas
-rw-r–r-- 1 root root 9761280 May 26 14:52 /boot/vmlinuz-6.12.15-debug+truenas
-rw-r–r-- 1 root root 9896448 May 26 14:54 /boot/vmlinuz-6.12.15-production+truenas

is it safe to run update-initramfs or are there any required parameters to pass?

winnielinnie · August 29, 2025, 1:44pm

Your boot-pool was degraded. This could be due to a failing drive or faulty memory.

Trying to pinpoint different initramfs’es or messing with config files seems like you’re skipping some of the fundamentals and making this too complicated.

The scrub revealed errors on the boot-pool? It appears so.

What about the results of full memtest passes? I would get this one done to at least rule it out. You don’t want to replace a boot drive, only for this issue to occur again because of bad RAM. If it passes full memtests without any errors, then you can at least rule it out.

Do you have another drive to use for a boot-pool instead?

dom · August 29, 2025, 4:30pm

as mentioned above i did run 2 full passes of memtest which took over 3h and found no errors.

i also did run scrub, just ran it again:

/home/admin# zpool status boot-pool
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:56 with 0 errors on Fri Aug 29 18:21:02 2025
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      sda3      ONLINE       0     0     0

errors: No known data errors

I installed 3 different versions (25.04.2, 25.04.2.1, 25.04.2.3) all showing the same issue where the debug image boots but the production image does not.
I’d like to rule out software issues before replacing the boot disk…

So i’d love to be able to see the logs of the update process and understand if it created a faulty image (and if yes, why that only failed after reboot instead of validating the checksum immediately and failing/cancelling the update).

I’m in the lucky position to be able to move the hardware from my basement to my desk and check the boot screen after connecting a monitor, but it could be a lot worse if I wasn’t close to the hardware…

winnielinnie · August 29, 2025, 4:35pm

Fresh installs, not updates, yes?

dom · August 29, 2025, 4:47pm

updates, all from 25.04.1

dom · September 3, 2025, 1:52pm

so 25.04.3-1 is working now on my end. How did i get there?
Trying to get more information about the failing update i opened truenas cli and ran

system update update

hoping the logs would reveal more information. But it just passed and after reboot it is up now.

root@truenas:/home/admin# ls -la /boot/*
-rw-r–r-- 1 root root 7187285 Aug 20 14:38 /boot/System.map-6.12.15-debug+truenas
-rw-r–r-- 1 root root 7135076 Aug 20 14:40 /boot/System.map-6.12.15-production+truenas
-rw-r–r-- 1 root root 265130 Aug 20 14:38 /boot/config-6.12.15-debug+truenas
-rw-r–r-- 1 root root 264722 Aug 20 14:40 /boot/config-6.12.15-production+truenas
-rw-r–r-- 1 root root 15 Sep 3 15:23 /boot/initramfs_config.json
-rw-r–r-- 1 root root 82369200 Aug 20 16:34 /boot/initrd.img-6.12.15-debug+truenas
-rw-r–r-- 1 root root 78222305 Sep 3 15:23 /boot/initrd.img-6.12.15-production+truenas
-rw-r–r-- 1 root root 9761280 Aug 20 14:38 /boot/vmlinuz-6.12.15-debug+truenas
-rw-r–r-- 1 root root 9896448 Aug 20 14:40 /boot/vmlinuz-6.12.15-production+truenas

Notably all dates and the file size for initrd.img-6.12.15-production+truenas have changed.
Also the used space for the 25.04.2.3-1 boot environment shown in the web ui has gone down to 2.81GB which is in line with the size of 25.04.1 (previous 25.04.2 versions showed 3.46GB)

Still would love to learn what exactly went wrong and how to get a more verbose log from the update process. It would also be great if the final result was validated somehow to prevent rebooting into an unusable state.