Upgrade to 25.10.2 fails on Proxmox VM

I’ve been running 25.10.1 in Proxmox on two servers for a month or so now without issues. After upgrading either one of them from within the GUI to the 25.10.2 release, they both kernel panic.

It seems that with either .2 Grub entry, they’re unable to load the ram disks.

I confirmed from grub that in the 25.10.2 boot directory, there is no ramdisk for the debug kernel.

Manually booting back into 25.10.1 works fine.

Hosts: Mac Pro 2009 and MS-01

Proxmox QEMU Q35 VM, with a virtual disk for the VM’s root filesystem.

Please report a bug

Provide the host and VM specs.

I haven’t seen any similar reports, but with 2 VM failures, there is clearly a problem.

I’m having no luck getting Jira to work. It asks me to log in, and if I am lucky to get the Create a ticket dialog to come up, it never finishes populating the drop down menus so I can continue with the submission. I’ve tried with two browsers, with pi-hole disabled, etc. For now, I’ll post what I’ve got here.

Proxmox 9.1.1 on both hosts.

Host 1:
Mac Pro 2009, Intel(R) Xeon(R) CPU X5690 @ 3.47GHz (2 Sockets)
RAM 128GB
Samsung 850 Pro 1TB SATA SSD for boot & VM storage

Host 2:
Minisforum MS-01, 13th Gen Intel(R) Core™ i9-13900H (1 Socket)
RAM 32GB
Kingston 1TB NVMe for boot & VM storage

VM:
Originally FreeNAS/TrueNAS CORE, built on ESXi
Cloned/Converted to Proxmox, upgraded to TrueNAS Scale
Cloned to both hosts
RAM: 16GB on MS-01, 48GB on Mac Pro
CPU: 8 Cores on MS-01, 6 cores on Mac Pro
SeaBIOS, q35, viommu=virtio
64GB drive, VirtIO SCSI for boot/root
Management NIC = VirtIO bridged to host
Traffic NIC = PCI SRIOV passed through (ConnectX-4 MT27700 MS-01, Chelsio T6225-CR Mac Pro)
Both use NVMEoF disks exported by RDS2216 for ZFS pool testing
Mac Pro also has onboard Intel SATA controller passed through with six Samsung 840 EVO SATA SSDs

Both VMs, independent of their host, exhibit this behavior on just the 25.10.2 boot volume. No issues with 25.10.1. Something in the upgrade process isn’t properly copying over the initial RAM disks.

You suspect it’s the update process..
If you tried installing a new VM.. would it fail?
(wait for 25.10.2.1)

Anyone else with same issue? -i’m surprised you are the only user reporting.
The only thing unusual is the NVMeoF disks

maybe provide a little more detail?

I just upgrade on my proxmox and it was fine went from 25.10.1 > 25.10.2 so this is not a generalized issue

note i once had errors like your on a physical machine when i found an disk has changed index, if you are passing through nvme drives for booting the truenas that can happen in a virtual env too, the error was cuased by initramfs startup sequence looking for the missing drive in a mirror to boot and not falling back to the one remaining drive (which makes me question why have an effing mirror, lo)

i also happend to upgrade my proxmox host (did you do that any time recently) if you did i note i had to finagle once again with my blacklists and modprobes, i am wondering if your truenas can see all the drives it is supposed to - do you boot from a virtual disk or something else in the vm?

I boot from virtual disk, all other disks are passthrough nvme or SATA controller pass though with the following, this changes binding order and while a softdep wont prevent a host drive from grabbing something if it really wants to this usually prevents it and is important because of how drivers are loaded BEFORE the vfio-pci is replaced by ha-manager on proxmox (it’s a classic race condition)

root@pve-nas1:/etc/modprobe.d# cat vfio-pci.conf 
softdep nvme pre: vfio-pci
softdep ahci pre: vfio-pci
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
options vfio-pci ids=2646:5024,8086:2700,1cc1:8201,1bb1:5018,1e60:2864,10de:1e07,1022:7901,10de:2bb1,10de:22e8

i found in the current version of proxmox these do nothing (they used to tho), including for completenss incase you wereusing this ro prevet drivers claiming devices (in the case below this used udev to force-vfio on my SATA controllers

# Replace BDFs with yours — repeat these two lines for each controller you want on VFIO
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:ea:00.0", ATTR{driver_override}="vfio-pci"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:ea:00.0", RUN+="/bin/sh -c 'modprobe vfio-pci; echo 0000:ea:00.0 > /sys/bus/pci/drivers/vfio-pci/bind'"

ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:ea:00.1", ATTR{driver_override}="vfio-pci"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:ea:00.1", RUN+="/bin/sh -c 'modprobe vfio-pci; echo 0000:ea:00.1 > /sys/bus/pci/drivers/vfio-pci/bind'"

ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:42:00.0", ATTR{driver_override}="vfio-pci"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:42:00.0", RUN+="/bin/sh -c 'modprobe vfio-pci; echo 0000:42:00.0 > /sys/bus/pci/drivers/vfio-pci/bind'"

ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:42:00.1", ATTR{driver_override}="vfio-pci"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:42:00.1", RUN+="/bin/sh -c 'modprobe vfio-pci; echo 0000:42:00.1 > /sys/bus/pci/drivers/vfio-pci/bind'"

# …add entries for the remaining 7 BDFs…
root@pve-nas1:/etc/udev/rules.d# 

lot of random info there but good luck, most more info if you can, i am about tl fly for a week so won’r be able to reply

tl;dr there is no general 25.10.2 in proxmox issue

@scyto The boot volume is a virtual disk. I’m passing a SATA HBA and NIC VF on the Mac, and just the NIC VF on the MS-01. And this is failing immediately, just after the kernel is loaded. The one initrd is failing checksum and the debug kernel can’t find its initial ram disk in the 25.10.2 ROOT directory. (I could try copying over the 25.10.1’s initrd’s, though…)

The NVMEoF drives are loaded by TrueNAS via the nvmf-autoconnect service, not Proxmox, so at the boot stage, they’re not even a factor.

From the screenshots you can see grub is complaining that the checksum verification of the primary initrd failed, and the second one it’s saying it can’t find the initrd for the debug kernel. So, yes, during the upgrade process, some script is burping or some download is failing or getting corrupted.

I can spin up a new VM and do a fresh install/upgrade to see what it does. There may be something unique with the boot volume since it has undergone a few changes (ESXi to Proxmox, CORE to SCALE). Since it was cloned after the conversion/upgrade to 25.10.1, that could explain why both VMs are experiencing the exact same problem.

1 Like

Gah, I was really hoping this was a real disk, interesting failure case, I can share my vm config if it help identify a difference that could indicate a root cause. But I noticed there is a patch soon and they may know the issue?

Same thing happens on updating to 25.10.2.1.

On a hunch, I decided to copy over the kernel, initrd, etc. from the /boot directory on 25.10.1 to 25.10.2.1. diff says they’re different, despite having the same version number. I figured it wouldn’t hurt. Then I rebooted and it came up without a hitch.

So, on my system at least, something is corrupting the kernel and/or initial RAM disk during the update process. Just from 25.10.1 to 25.10.2.x. We got to 25.10.1 from CORE 13 etc.

And on both versions there is no debug initial RAM disk. Not sure if that’s a feature or a bug. (Why list it in GRUB if it’s not going to work?)

1 Like