Moved Truenas From Bare Metal to Proxmox VM - input/output errors and can't get to data

Cross posting my thread on r/truenas since no one has responded yet. I sure hope someone will have some intput/help - I’m lost big time.


I had truenas scale 24 on a standalone server (intel i3-7100T, 64 gb ecc and 8x12tb WD drives with a HBA).

I’ve been working towards a new setup and pulled the trigger and moved all the drives from the old server to a new one that is a VM under proxmox (i10 intel cpu with 128gb of ram for Proxmox - gave this 4 cpu and 32gb of ram and passed the new hba into the proxmox vm)

I did an export/disconnect of the drives on the old truenas server and then imported the pool and restored settings from back on the new VM (also scale 24) and everything came back as expected - no issues that I could see.

I then upgraded from scale 24 to 25 and that went well - no issues.

I then added 4 new drives to the pool (2 vdevs of 2x8gb drives) so I could expand my pool.

Then I got some kind of issue/warning about mismatched drive sizes in the pool inside of Truenas, so I pulled those drives from the tank pool and left the original 8x12 just like they were on the old server.

Then I started to notice some issues - I could not and still cannot copy files from the old pool to the new pool without some kind of issue. So I ran a scrub, saw the issues noted in pool status -v - I replaced any files that were noted as corrupted. Re-ran another scrub, which just finished and showed no known errors when I looked at it.

So I tried to copy a movie file from the old movie pool to the new movie pool (the new pool with 4x8gb drives) and I get an error from the shell using cp of “input/output” error and it will not copy the file. It creates the folder on the new pool, but then fails with this input/output error. And now if I do a zpool status -v any file I tried to copy now has permanent corruption. I don’t under stand why this is happening when a full scrub shows gtg right after it’s done and now trying to access a file and copying it makes it corrupt. Including last zpool status -v output. I sure hope someone can help out - I’m puzzled!

pool: tank

state: ONLINE

status: One or more devices has experienced an error resulting in data

corruption. Applications may be affected.

action: Restore the file in question if possible. Otherwise restore the

entire pool from backup.

see: openzfs-docs/msg/ZFS-8000-8A

scan: scrub repaired 0B in 18:17:41 with 0 errors on Wed Aug 20 09:41:41 2025

remove: Removal of vdev 4 copied 826M in 0h0m, completed on Sun Aug 17 15:03:40 2025

7.62K memory used for removed device mappings

config:

NAME STATE READ WRITE CKSUM

tank ONLINE 0 0 0

mirror-0 ONLINE 0 0 0

1ea9f584-fe96-4828-a048-045596a16cb9 ONLINE 0 0 0

e080a44c-e495-4cec-8d05-05b5802d4bf2 ONLINE 0 0 0

mirror-1 ONLINE 0 0 0

b06d42f8-2eda-4678-b677-54ed1c683378 ONLINE 0 0 0

e7574d9b-71a0-45d5-8b72-8d8fbf4b307b ONLINE 0 0 0

mirror-2 ONLINE 0 0 0

016eb050-41c1-4abf-9a8e-6b30209dedad ONLINE 0 0 2

58fa4281-e2bf-4cc2-9079-42194c03fd36 ONLINE 0 0 2

mirror-3 ONLINE 0 0 0

4879981d-e447-40b0-89a6-e828a78d31ca ONLINE 0 0 4

05e86aaf-f752-4a02-a404-35d0317d7a98 ONLINE 0 0 4

errors: Permanent errors have been detected in the following files:

/mnt/tank/media/movies/A Boy Named Charlie Brown (1969)/A Boy Named Charlie Brown (1969).mkv

/mnt/tank/media/movies/1917 (2019)/1917 (2019).mkv

Just as an FYI - I tried to copy one more item from the NAS to my local Windows desktop (taking dataset to dataset out of the equation) - immediately errored and now shows corrupt in the zpool status -v

Even just trying to play a media file will result in corruption. So I’m thinking no backups, don’t touch the files etc etc until hopefully someone on here can put me on a path. I may have just lost all my data sigh

Here’s a paste of the DMESG logs - I’m not sure how to read these. I tried to put them here but I guess Reddit has a limit on characters or something - so I dropped them on Pastebin. I did drop them into an ai to poke at for fun - it concluded as pasted below.

DMESG Logs on Pastebin

<ask me for logs on pastebin - can’t post it here apparently>

Conclusion

The dmesg logs reveal that your SAS controller (mpt3sas_cm0) is experiencing faults and resets, leading to I/O errors and command timeouts on multiple disks within your tank pool. This is the root cause of the data corruption you’re seeing. It’s likely a hardware issue with the HBA itself, the SAS cables, the drives, or even potentially the power supply to the drives.

Did you blacklist your HBA on proxmox so that it doesn’t randomly try to take control of your pool? Proxmox can use ZFS, and folks in the past have had critical pool failures when proxmox randomly decides that your pool is in its best interest vs Truenas’.

I did not know I needed to do that - is there a way to see if that is occurring or a links to point me in a direction?

That seems kind of plausible. Look forward top your response. Thanks for reaching out, appreciate it!

Reading on it now…

1 Like

I wish I had specific documentation to provide, but I don’t. I just ended up making a second system specifically for proxmox when I realized I had more virtualization needs than truenas could handle & never wanted to virtualize my nas…

I simply know that this is a deadly pitfall & hope that this knowledge resolves your issues.

it was not blacklisted - have done so and looks good now.

04:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
Subsystem: Broadcom / LSI SAS 9300-16i [1000:3130]
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas

06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
Subsystem: Broadcom / LSI SAS 9300-16i [1000:3130]
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas

Digging some more to confirm iommu settings (I know it’s enabled in the bios)
and making sure I have all the settings I need for the passthrough.

1 Like

No worries and thanks for putting me on a track as this is apparently needed.

1 Like

Heads-up, it is possible that the data you’ve had in your pool since virtualization is suspect since HBA wasn’t blacklisted from the start. It may be worth investigating, though it is possible you’ve so far been lucky.

This is one of those awful things like port multipliers; it just works fine at first - and then eventually it doesn’t & everything is horrible.

ok - so far looking good - here’s the changes I made.

  1. confirmed IOMMU/VT-d/AMD-v enabled in bios (in my case a mpg z590 gaming force motherboard)
  2. Identified the HBA driver - which was mpt3sas
  3. Blacklisted the driver
  4. Configured early binding of the HBA to the vfio-pci driver early in the boot process.
  5. update-initramfs -u
  6. rebooted
  7. ran lspci -nn -k again and confirmed that Proxmox was no longer using the HBA
  8. Then I made sure the passthroughs were right - I eneded up adding full function and pci-express on the SAS HBA

So far…I was able to copy a media file to my local desktop - the copy went through and then I was able to watch it - and so far the file I copied is not showing corrupt on the zpool status -v after I copied it.

I’ll keep going from here - but I ‘think’ we might have it. More testing to come. But 100% better than where I was.

Thanks a million @Fleshmauler for the idea/tip - I think you might have hit it on the head so far. crosses fingers

2 Likes

Glad you got on the right track, but let this be a lesson for you.

Unless you don’t care about the data, don’t try to do things you don’t fully understand without having a backup.

You got lucky, but many many many before you weren’t so lucky and lost their pools.

Agree and lesson learned. I certainly see your point. Thanks!