Installation on bare metal, updated via web gui.
After it failed and i removed the broken image & reverted back to 25.04.1 i downloaded the manual update file, verified the checksum and tried again with the same result.
What kind of logs would help debug this and what other ways to upgrade are there?
Recently added a hailo8 pcie card, but 25.04.1 booted fine with it prior to the update and it comes up in lspci. Could added but unused hardware cause an issue like this?
I did run a scrub and removed a few older boot environments before trying again with the manually downloaded 25.04.2.1 update, good call on the memtest, going to run that now. Thank you for your help!
2 passes of memtest took 3h+, both finished with 0 errors.
ran an extended smart test on the boot drive as well, 0 errrors too.
The hardware is a morefine s500+ minipc, ryzen 5900hx with 64gb of gskill ddr4 3200 ram, a 128gb sata ssd boot drive (older samsung 830 that was previously used in a laptop but never saw huge amount of writes) and 2 seagate firecuda 2TB ssds for the storage pool.
One thing i did notice is that the broken boot environment for 25.04.2.1 was significantly larger than the older ones. Didnât write down the number before deleting but it was 3.x gb while the olders were between 2.6 and 2.8.
the sata ssd has 128GB and is exclusively used for os. As already mentioned, i did delete a few older boot environments before trying the manually downloaded update file.
What would next steps include? trying 25.04.2.0 ? Or removing the extra hardware and retrying 25.04.2.1? Is there a logfile or some other way to gain information about whats broken? booting it just says initramfs checksum failed, press any key.
Continuing leads to âKernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)â
Stats/Settings
Boot Pool Condition: Degraded Check Alerts for more details.
Size: 118 GiB
Used: 10.78 GiB
Last Scrub Run: 2025-08-22 14:44:17
the degraded status is for 31 write errors on sda3, didnât go away after scrub.
The alert says
Critical
Boot pool status is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffectedâŚ
noticed the message about unknown AMD-Vi option in my âscreenshotâ and went spelunking in the bios. found âAMD CBSâ â âNBIO COMMONâ â âIOMMUâ and changed it from â[AUTO]â to â[ENABLED]â.
tried installing 25.04.2 after removing the new pcie card, same error occured for the regular boot option. However, the advanced boot option worked. (i never tried that with 25.04.2.1)
hereâs lscpu which shows AMD-V but not AMD-Vi (which is a shame because it means i might not be able to pass through the pcie card to a vm/app?)
but given this fail happens without the new card and i didnât activate any passthrough options for it yet that shouldnât be the cause of it.
looking at the grub config in /boot/grub/grub.cfg the only difference between the default and advanced option seems to be the image used (production vs debug)
still happening with latest 25.04.2.3, is there a way to get the output of what generated the initrd image?
looking at the files themselves i find it a bit suspicious that the one thats failing (6.12.15-production) has been modified on jun 27, but all others on may 26
root@truenas:/home/admin# ls -la /boot/*
-rw-râr-- 1 root root 7187285 May 26 14:52 /boot/System.map-6.12.15-debug+truenas
-rw-râr-- 1 root root 7135076 May 26 14:54 /boot/System.map-6.12.15-production+truenas
-rw-râr-- 1 root root 265130 May 26 14:52 /boot/config-6.12.15-debug+truenas
-rw-râr-- 1 root root 264722 May 26 14:54 /boot/config-6.12.15-production+truenas
-rw-râr-- 1 root root 15 Jun 27 11:51 /boot/initramfs_config.json
-rw-râr-- 1 root root 79702016 May 26 16:45 /boot/initrd.img-6.12.15-debug+truenas
-rw-râr-- 1 root root 78241680 Jun 27 11:52 /boot/initrd.img-6.12.15-production+truenas
-rw-râr-- 1 root root 9761280 May 26 14:52 /boot/vmlinuz-6.12.15-debug+truenas
-rw-râr-- 1 root root 9896448 May 26 14:54 /boot/vmlinuz-6.12.15-production+truenas
is it safe to run update-initramfs or are there any required parameters to pass?
Your boot-pool was degraded. This could be due to a failing drive or faulty memory.
Trying to pinpoint different initramfsâes or messing with config files seems like youâre skipping some of the fundamentals and making this too complicated.
The scrub revealed errors on the boot-pool? It appears so.
What about the results of full memtest passes? I would get this one done to at least rule it out. You donât want to replace a boot drive, only for this issue to occur again because of bad RAM. If it passes full memtests without any errors, then you can at least rule it out.
Do you have another drive to use for a boot-pool instead?
as mentioned above i did run 2 full passes of memtest which took over 3h and found no errors.
i also did run scrub, just ran it again:
/home/admin# zpool status boot-pool
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:56 with 0 errors on Fri Aug 29 18:21:02 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sda3 ONLINE 0 0 0
errors: No known data errors
I installed 3 different versions (25.04.2, 25.04.2.1, 25.04.2.3) all showing the same issue where the debug image boots but the production image does not.
Iâd like to rule out software issues before replacing the boot diskâŚ
So iâd love to be able to see the logs of the update process and understand if it created a faulty image (and if yes, why that only failed after reboot instead of validating the checksum immediately and failing/cancelling the update).
Iâm in the lucky position to be able to move the hardware from my basement to my desk and check the boot screen after connecting a monitor, but it could be a lot worse if I wasnât close to the hardwareâŚ
Notably all dates and the file size for initrd.img-6.12.15-production+truenas have changed.
Also the used space for the 25.04.2.3-1 boot environment shown in the web ui has gone down to 2.81GB which is in line with the size of 25.04.1 (previous 25.04.2 versions showed 3.46GB)
Still would love to learn what exactly went wrong and how to get a more verbose log from the update process. It would also be great if the final result was validated somehow to prevent rebooting into an unusable state.