Using thunderbird to access info stored on iscsi share on truenas. Deleting any email causes a reboot of truenas server

Milkysunshine · February 3, 2025, 5:44pm

Definitely a strange one here.

I have several VMs and devices connecting to truenas scale with various services and protocols. Most are working flawlessly and haven’t changed for months other than truenas updates.

Strangely, when I try to do certain things with my email client (thunderbird on windows 11), which accesses the data over iscsi on the TrueNAS Scale server (ElectricEel-24.10.2, which is within a proxmox VM), the truenas VM hard resets. I don’t see any crash indicator, and nothing is sent to my graylog server. It is extremely repeatable. It also did it on ElectricEel-24.10.1 as well. I just updated to see if it would resolve the issue. It was working fine yesterday. The specific things that I have found to cause this reboot:
Marking email read
Deleting any email
Compacting my ‘Local Folders’.

When new mail is received, it updates the files fine on the iscsi drive.

As of now, I’m not seeing any zfs or hdd errors, but I can’t completely rule them out. I was able to copy all data from the iscsi share to my windows 11 pc, creating and deleting files on the iscsi drive work fine.

I guess I am just looking for ideas to troubleshoot this. Maybe there is a log I’m not thinking of?

Milkysunshine · February 7, 2025, 4:39pm

I just wanted to give an update on this, as I don’t believe it was a TrueNAS issue.

Upon further investigation, and diagnosis, my entire dataset or zpool was roached.

As with most every IT issue I’ve encountered that isn’t obvious, I can’t tell you what caused it, but I can tell you my methodology in trying to, and eventually fixing it.

My zpool (10 spinning disks, 6 NVME SSDs [ 2 cache, 2 slog, 2 special small blocks) would lock up the entire system when trying to write certain data to it. I could copy 15GB of files to it fine, then a random 5MB picture would cause the issue. Those are just examples…

I did a scrub, which found no issues.

I have a weekly replication task to a backup server, which I ran without issue. Reads were not a problem.

The system has 384GB of ECC mem, and all other VMs and containers never had an issue.

There weren’t any system reboots, updates, etc done in the days leading up to this occurring.

Because I originally only passed the spinning disks from the proxmox server to the TrueNAS vm (NVMEs were all directly passed via PCIe), I adjusted my setup and passed the entire controller (RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 in a dell R730XD in HBA mode) directly to the VM. The same issue persisted.

I then tried exporting the zpool, and importing it directly into a bare metal truenas install (same version). It would not import. I would get the error: “The metadata required to open the pool is corrupt”. Which is bizarre, as all drives were shown as online, and there were no errors while in the other Vitrual Machine. Plus when I ran the import with the -n flag, it gave no errors.

All drives tested fine with everything I threw at them.

So I jumped to last resort. Did a final replication to my backup TrueNAS server, wiped the drives, and re-created the whole thing on the same drives. Replication from the backup server took forever.

I’m not sure why TrueNAS replication (even using ssh+netcat) will only average around 3.5gbit on 10gbit network, but that seems to always have been the case for me. 30TB+ of data isn’t quick to work with anyway…

So now, everything SEEMS ok, but I need to go through the restored data and ensure everything is there… 30+ years worth of data… because the replication from the live server to the backup server will no longer work as it did, and I have to replicate from scratch. I still have yet to figure out how to restore replicated data, then be able to update the backed up data from the live server without replicating from scratch. If someone has any advice on what I’m doing wrong, I’d appreciate that. …I get the error: “No incremental base on dataset ‘Main’ and replication from scratch is not allowed”

So yeah, TL:DR

Seems like a fluky zfs error.
Had to restore from backups due to my zpool being corrupted for some completely unknown reason.