Pools:
6x Toshiba 16TB Raid-Z2 pool named big
2x WD shuck 8TB stripe pool named small
Server specs:
Asus W680 IPMI Mobo
Intel14600K
2x48GB DDR5
Usually LSI 9500-8i but currently for troubleshooting LSI 9300-8i card
Latest Proxmox 8.x with PCIe passthrough confirgured correctly to the TrueNas VM
TrueNas Scale TrueNAS-25.04.1
Something? happened about 6 months ago and my TrueNAS became super unstable
Switched from 13600K to 14600K under warranty
Swapped ram as was faulty in memtest
Rebuilt VM and imported config from backup resulting in constant rebooting during boot up on ix-service and ix-zfs.service
Desperate, unplugged both WD drives and it booted!
Plugged in one WD drive and booting fine still, but now in GUI, one of the 16TB coming up as belonging to the pool small?
It appears its there on the zpool status command, im so confused.
How can I be confident of what is happening and to remedy the problem.
On top of that, one of the 16TB needs to be replaced, getting a few light uncorrectable errors, but is working - good grief!
PSU is a SuperFlower 1000W heavy duty beast, dont have the exact model handy at the moment.
New ram and CPU pass all testing including memtest just fine
I have scheduled SMART short tests and thats where the problem with one of the drives was identified after the VM was able to boot.
I should note, I am absolutely confident of the Proxmox install and the VM setup for TrueNas, not my first rodeo with VM for Truenas and have had the same pool for several hardware iterations and even drive upgrades, and a switch from ESXI to Proxmox over the years.
It ran fine for about 6 months till all the CPU and memory issues seem to hit at once
It may well be a bug, but its a bit concerning seeing it there with the other (broken) pools name.
I can’t actually get it to boot with both the WD drives plugged in
It goes into a boot loop when its at the starting ix-zfs.service and ix-netif.service untill i remove one WD drive.
Mind you, it does show both of the WD drives during boot when its booting up prior to it boot looping. Ill try to attach a vidoe of it posting with both drives.
I only ever tested with the same WD drive on, I should try to boot with the other WD drive too on its own. Ill be even more puzzled if that does boot!
I wanted to but could not embed images and I need to work out what has happened to my SSH access. Please bare with me.
Right! Was not aware all this time, trying that now, seems it was CTRL + Insert
root@storage[~]# zpool status
pool: big
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 823G in 03:33:57 with 6 errors on Fri Aug 15 03:36:29 2025
config:
NAME STATE READ WRITE CKSUM
big ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sda2 ONLINE 0 0 12
sdd2 ONLINE 0 0 12
sdb2 ONLINE 0 0 12
sdc2 ONLINE 0 0 12
sdf2 ONLINE 0 0 12
sdg2 ONLINE 0 0 12
errors: 8 data errors, use '-v' for a list
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:04 with 0 errors on Sat Aug 9 03:45:05 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
vda3 ONLINE 0 0 0
errors: No known data errors
Name Serial Disk Size Pool
sdc 9130A05MFWTG 14.55 TiB big
sdd 9130A02KFWTG 14.55 TiB big
sdf 9120A0B3FWTG 14.55 TiB big
sdg 9130A04WFWTG 14.55 TiB small (Exported)
sdb 9130A03NFWTG 14.55 TiB big
sda 23D0A0UGF4MJ 18.19 TiB big
sde 7SH5AU9D 7.28 TiB small (Exported)
vda 32 GiB boot-pool
A small update, your comment has me quite worried at this point, what is going on here!?
I unplugged the WD drive again and all things are back to normal now, the big pool has all drives pointing to it now.
I have not found a solution to this issue, or even how it got to this point, would appreciate any help.
Further to this, I now reckon the other pool “small” is probably fine, and it may explain the boot loop? the 2 small pool drives are interfering with the big pool maybe, causing the whole issue.
So I will keep the 2x WD out of the picture currently, but need to figure out how to get the uuids to be picked up permanently now.
Did you also blacklist the PCIe device so that Proxmox can’t use it?
Passthrough is not enough.
You can export the pool and reimport it using the partuuid’s instead.
I believe you just add the -d /dev/disk/by-partuuid/ to the import command, but someone else here can probably chime in on that.
Yes, the mpt3sas driver is blacklisted for LSI cards
vfio modules are enabled and can confirm vfio-pci is the driver in use
IOMMU is enabled and confirmed via “dmesg | grep -e DMAR -e IOMMU -e AMD-Vi”