Instability with Proxmox SATA controller PCI passthrough to TrueNAS Scale 24 VM

I’m asking for support on how to identify a problem in my homelab, where I have two similarly configured systems with TrueNAS Scale running in VMs over Proxmox VE 8.2.4.

The two systems run on significantly different hardware: A tiny mini PC with PCI passthrough to host TrueNAS VM of a single SDD (to host TrueNAS datasets) and a mini PC with hardware passthrough to host TrueNAS VM of an additional SATA controller board (again, to host TrueNAS datasets).

On both systems, Proxmox boots from distinct NVMe drives and it does not use neither the single SDD nor the additional SATA controller board passed to the TrueNAS VMs hosted on two Proxmox systems.

The two Proxmox systems run the hosted TrueNAS Scale 23 for months, with no troubles, being stable and reliable, until I decided, a few months ago, to upgrade both systems to Scale 24 Dragonfish.

After upgrading both to Scale 24, the system with PCI passthrough of the additional SATA controller board became unstable, experiencing various problems. Unable at that time to address the problem, I decided to step back that system to Scale 23.10.2, while leaving the other smaller system running Scale 24, while waiting for newer Dragonfish releases and available time before attempting to upgrade it again.
Both systems run stable in the meanwhile, one with Scale 23 and the other with Scale 24.

A few weeks ago I decided to address the problem again, attempting to upgrade the system running Scale 23 to Scale 24.04.
Having Proxmox servers over the hardware, I decided to create a second VM to run, alternatively, Scale 23 and 24 on the same hardware presenting the issues, to investigate the problem, since it’s unclear to me if the problem is related to the VM configuration in Proxmox, to TrueNAS, or to something else.

I did many attempts, configuring the new VM hosting Scale 24 with different BIOS (SeaBIOS - OVMF), Machine (i440fx - q35) , PCI Device options and underlying Proxmox drivers, and installing Scale 24.04 many times over it, but the system runs stable only using Scale 23.10.2.
I found no clue on how to address the problem or how to investigate further.

While running Scale 24.04, the system reports all of the sudden & all together, within 24h from boot, problems with the attached disks via the additional SATA controller, like [“any disks” being, in any order, anyone of the eight disks in the JBOD]:

  • Device: “any disks” [SAT], not capable of SMART self-check.
  • Device: “any disks” [SAT], failed to read SMART Attribute Data.
  • Device: “any disks” [SAT], Read SMART Error Log Failed.
  • Device: “any disks” [SAT], Read SMART Self-Test Log Failed.
  • Device: “any disks” [SAT], not capable of SMART self-check.

The problems persist, with the system unable to access the disks.
Properly shutting down the system (long and troubled) and booting up again, all of the disks are accessible again, marked in DEGRADED status but with no error reported.

SYSTEM INFO:
The underlying hardware experiencing the instability only using Scale 24 is a Microforum MS01 (i9 13900H) where I installed the QXP-800eS-A1164 additional SATA controller coming with a QNAP 8BAY TL-D800S JBOD, and where I set up Proxmox with PCI passthrough to the TrueNAS VM of both ASM1164 controllers of the QXP board.

Microforum MS01 i9 13900H
96 GB RAM
512 GM SDD (Proxmox boot & VM volumes)

Proxmox VE 8.2.4 (kernel 6.8.8-2-pve)

TrueNAS VM (current setup, after other attempts)
16 GB RAM
Processors Type Host, 2 sockets, 2 cores
BIOS OVMF
Machine q35,viommu=intel
SCSI Controler VirtIO SCSI single
PCI Device passthrough:
hostpci0: 0000:03.00.0 (ASM1164 #1 on QXP board)
hostpci1: 0000:06.00.0 (ASM1164 #2 on QXP board)

I appreciate suggestions on how to solve and/or how to investigate the problem.

Thanks.

One thing stands out, the SATA controller card is, going by ASMEDIA’s description, a SATA Port multiplier. Use of those cards is discouraged due to a long history of issues.

Maybe that’s why, with a newer driver/kernel in Dragonfish, you suddenly see issues. It could be down to the code for that hardware simply not being as well tested.

Also, when you write “SDD” do you mean Solid-state Drive (SSD) or something else?

2 Likes

Yes, thanks, I’m aware the controller card coming with the QNAP JBOD is a cheap one. Being my homelab, I considered purchasing a different card to control the drives in the JBOD, but I decided to give it a try (and it run stable for months).
I’ll evaluate the opportunity to buy a new controller card (if unable to solve the instability).

Yes, correct, SSD (not SDD). Sometimes I mistype, even in my primary language.

Multiple users in this forum have reported that upgrading to Dragonfish caused problems with ASMEDIA cards.

As said by @neofusion , best to stay away from them.

Can you confirm you are testing with Dragonfish 24.04.2? Which uses a newer kernel again and ostensibly restores asmedia port multiplier functionality (they’re still crap though)

Yes, I confirm.
I am testing with Dragonfish-24.04.2.

Hello, apols for the noobish question but would other small HBA cards such as the Dell H200E (in IT mode) or the LSI SAS 9200-8i work with the QNAP TL D800s?

I have an MS-01 and am exploring setting up my first NAS using that QNAP 8 bay JBOD and TrueNAS Scale running in a KVM. I don’t want instability and would happily spend an extra £35 to ditch the ASM1164 for a more reliable card.

Advice very much appreciated.

I’m sorry, but I’m unable to give you advice: I did not test other SATA controllers on the MS-01 and QNAP JBOD.

I decided to remain with TrueNAS Scale 23 (not upgrading to Dragonfish) and I’m still using the dual ASM1164 SATA controller card (the one coming with the QNAP JBOD). Having no time to seek an alternative to the ASMEDIA controller, I decided to wait and see.
Five months later, I can say I had no instabilities from the TrueNAS Scale 23 virtualized over the MS-01, using the ASMEDIA SATA Controller to connect to the JBOD. I’m aware that my current solution is sub-optimal, but it is my homelab, not a production system.

Anyhow, I’m interested to ear experience from anyone testing other SATA controller on MS-01 and QNAP JBOD with virtualized TrueNAS Scale. I may decide to ditch the ASMEDIA controller in the future.

1 Like

Nice to hear you’ve had no issues, thanks for the followup!