Instability with Proxmox SATA controller PCI passthrough to TrueNAS Scale 24 VM

Endorse451 · July 27, 2024, 9:22am

I’m asking for support on how to identify a problem in my homelab, where I have two similarly configured systems with TrueNAS Scale running in VMs over Proxmox VE 8.2.4.

The two systems run on significantly different hardware: A tiny mini PC with PCI passthrough to host TrueNAS VM of a single SDD (to host TrueNAS datasets) and a mini PC with hardware passthrough to host TrueNAS VM of an additional SATA controller board (again, to host TrueNAS datasets).

On both systems, Proxmox boots from distinct NVMe drives and it does not use neither the single SDD nor the additional SATA controller board passed to the TrueNAS VMs hosted on two Proxmox systems.

The two Proxmox systems run the hosted TrueNAS Scale 23 for months, with no troubles, being stable and reliable, until I decided, a few months ago, to upgrade both systems to Scale 24 Dragonfish.

After upgrading both to Scale 24, the system with PCI passthrough of the additional SATA controller board became unstable, experiencing various problems. Unable at that time to address the problem, I decided to step back that system to Scale 23.10.2, while leaving the other smaller system running Scale 24, while waiting for newer Dragonfish releases and available time before attempting to upgrade it again.
Both systems run stable in the meanwhile, one with Scale 23 and the other with Scale 24.

A few weeks ago I decided to address the problem again, attempting to upgrade the system running Scale 23 to Scale 24.04.
Having Proxmox servers over the hardware, I decided to create a second VM to run, alternatively, Scale 23 and 24 on the same hardware presenting the issues, to investigate the problem, since it’s unclear to me if the problem is related to the VM configuration in Proxmox, to TrueNAS, or to something else.

I did many attempts, configuring the new VM hosting Scale 24 with different BIOS (SeaBIOS - OVMF), Machine (i440fx - q35) , PCI Device options and underlying Proxmox drivers, and installing Scale 24.04 many times over it, but the system runs stable only using Scale 23.10.2.
I found no clue on how to address the problem or how to investigate further.

While running Scale 24.04, the system reports all of the sudden & all together, within 24h from boot, problems with the attached disks via the additional SATA controller, like [“any disks” being, in any order, anyone of the eight disks in the JBOD]:

Device: “any disks” [SAT], not capable of SMART self-check.
Device: “any disks” [SAT], failed to read SMART Attribute Data.
Device: “any disks” [SAT], Read SMART Error Log Failed.
Device: “any disks” [SAT], Read SMART Self-Test Log Failed.
Device: “any disks” [SAT], not capable of SMART self-check.

The problems persist, with the system unable to access the disks.
Properly shutting down the system (long and troubled) and booting up again, all of the disks are accessible again, marked in DEGRADED status but with no error reported.

SYSTEM INFO:
The underlying hardware experiencing the instability only using Scale 24 is a Microforum MS01 (i9 13900H) where I installed the QXP-800eS-A1164 additional SATA controller coming with a QNAP 8BAY TL-D800S JBOD, and where I set up Proxmox with PCI passthrough to the TrueNAS VM of both ASM1164 controllers of the QXP board.

Microforum MS01 i9 13900H
96 GB RAM
512 GM SDD (Proxmox boot & VM volumes)

Proxmox VE 8.2.4 (kernel 6.8.8-2-pve)

TrueNAS VM (current setup, after other attempts)
16 GB RAM
Processors Type Host, 2 sockets, 2 cores
BIOS OVMF
Machine q35,viommu=intel
SCSI Controler VirtIO SCSI single
PCI Device passthrough:
hostpci0: 0000:03.00.0 (ASM1164 #1 on QXP board)
hostpci1: 0000:06.00.0 (ASM1164 #2 on QXP board)

I appreciate suggestions on how to solve and/or how to investigate the problem.

Thanks.

neofusion · July 27, 2024, 9:45am

One thing stands out, the SATA controller card is, going by ASMEDIA’s description, a SATA Port multiplier. Use of those cards is discouraged due to a long history of issues.

Maybe that’s why, with a newer driver/kernel in Dragonfish, you suddenly see issues. It could be down to the code for that hardware simply not being as well tested.

Also, when you write “SDD” do you mean Solid-state Drive (SSD) or something else?

Endorse451 · July 27, 2024, 10:14am

Yes, thanks, I’m aware the controller card coming with the QNAP JBOD is a cheap one. Being my homelab, I considered purchasing a different card to control the drives in the JBOD, but I decided to give it a try (and it run stable for months).
I’ll evaluate the opportunity to buy a new controller card (if unable to solve the instability).

Yes, correct, SSD (not SDD). Sometimes I mistype, even in my primary language.

Farout · July 27, 2024, 12:09pm

Multiple users in this forum have reported that upgrading to Dragonfish caused problems with ASMEDIA cards.

As said by @neofusion , best to stay away from them.

Stux · July 28, 2024, 2:36am

Can you confirm you are testing with Dragonfish 24.04.2? Which uses a newer kernel again and ostensibly restores asmedia port multiplier functionality (they’re still crap though)

Endorse451 · July 28, 2024, 9:43am

Yes, I confirm.
I am testing with Dragonfish-24.04.2.

bonobo · January 8, 2025, 1:14pm

Hello, apols for the noobish question but would other small HBA cards such as the Dell H200E (in IT mode) or the LSI SAS 9200-8i work with the QNAP TL D800s?

I have an MS-01 and am exploring setting up my first NAS using that QNAP 8 bay JBOD and TrueNAS Scale running in a KVM. I don’t want instability and would happily spend an extra £35 to ditch the ASM1164 for a more reliable card.

Advice very much appreciated.

Endorse451 · January 8, 2025, 2:49pm

I’m sorry, but I’m unable to give you advice: I did not test other SATA controllers on the MS-01 and QNAP JBOD.

I decided to remain with TrueNAS Scale 23 (not upgrading to Dragonfish) and I’m still using the dual ASM1164 SATA controller card (the one coming with the QNAP JBOD). Having no time to seek an alternative to the ASMEDIA controller, I decided to wait and see.
Five months later, I can say I had no instabilities from the TrueNAS Scale 23 virtualized over the MS-01, using the ASMEDIA SATA Controller to connect to the JBOD. I’m aware that my current solution is sub-optimal, but it is my homelab, not a production system.

Anyhow, I’m interested to ear experience from anyone testing other SATA controller on MS-01 and QNAP JBOD with virtualized TrueNAS Scale. I may decide to ditch the ASMEDIA controller in the future.

bonobo · January 8, 2025, 7:55pm

Nice to hear you’ve had no issues, thanks for the followup!

jro · April 14, 2025, 3:15pm

Are you able to upgrade to 25.04 to check if this issue still persists? If it does, can you open a bug ticket? There were some upstream changes to the Asmedia drivers and I’m hoping this has been resolved.

Endorse451 · April 14, 2025, 5:13pm

I’ll give it a try and let you know, but no sooner than a few weeks.

jro · April 14, 2025, 5:24pm

Not a problem, and no rush. I appreciate you experimenting!

jjrushford · April 18, 2025, 3:03am

I was reading through this post from the beginning and thought I’d share my setup. I currently have two Proxmox hosts each running TrueNAS vm’s with PCI pass through. On one Proxmox host, I’m passing through an 8 port ASMEDIA 1064 SATA controller and on the other Proxmox host, I’m passing through a 16 port LSI 9300 SATA controller. I’ve not seen any instability on either TrueNAS VM. I just upgraded from Dragonfish to 25.04 just a couple of days ago and have not had any problems on 25.04 either. I did have to update the firmware on the LSI though to Version 16.00.00.00 (2017.05.02) as was recommended to me on this forum. But I’ve not seen any issues with TrueNAS 24 or 25