Scale 24.10.0.2 crashes on upload to encrypted storage

My system:

  • Gigabyte 970A-UD3P Rev 2.0 updated last 2016 edited bios for nvme boot capability
  • FX-6300 liquid cooled
  • realtek onboard nic only
  • 600w psu
  • 32gb (4x8gb)1600mhz vengeance ddr3
  • Nvidia GE-Force GTX-960 pcie x16
  • 128gb nvme on pcie x16 (x4)
  • Onboard Sata: 3x 12TB each configured as mirror with one on standby 10 TiB.
    2x WD SSD 500gb mirror for apps and fast storage needs.

My issue is when I initially created the pool I did not encrypt it. I thought this would not be a problem (and still probably isnt). However when I create an encrypted dataset and begin to move data to it, my server crashes. This has mostly been tested over smb and happening when smb is involved.

example one:
Windows 10 pro file history to encrypted dataset over smb. When I initiate the move in windows it works for roughly 1 minute, then I get an error that the drive cannot be found. The server reboots and the drive transfer is partially complete but it will not stay running.

example two
I have tried to use nextcloud with similar results. Create encrypted dataset and add drive to nextcloud inside container config (as host path). Can transfer small amounts but when I go to back up something fully same thing. Server reboots and will not stay running.

So seeing this happen a bunch I think, oh man, some piece of this kit is failing. Smart tests, memtest86, prime95, re-install, re-install on ssd and hdd, and many many more tests. PSU tests well and i am not recieving messages about power anywhere in the debugs.

Do the same thing to an unencrypted dataset on the same pool and it works no problem.

Backed up over 2tb no problem. So what’s going on here?? Is there some sort of permission issue that could be taking place? Should i have encrypted the whole pool? Is this my onboard nic? I have seen others saying they didn’t have any errors either but server would hang. I seem to be having something similar happening. What can I do to test these theories and get meaningful results?

This system albeit old has passed every hardware check Ive done, and have installed other uefi systems and found no stability issues. I’m really lost here. I’ve been messing with this thing for over a year now and simply can’t rely on it for anything. Pretty frustrating.

24.10.1 contained quite a few ZFS bug fixes… at least 1 related to encryption.

Update to 24.10.2 as a starting point… then see if problem persists.

That is the current version I am on next would be Fangtooth. Upgrading hasn’t changed anything for several versions now. This is why I was looking into hardware and system stability so much. Maybe I missed something?

Have you noticed if your CPU is under heavy load when uploading before the crash?

Do you have dedup and/or any specific compression on the dataset other than the default lz4?

Your thread title says 24.10.0.2 and the current version is 24.10.2. So the suggestion was to upgrade to current if you are not on it (or just fix your thread title if you’re already on 24.10.2).

2 Likes

@DjP-ix my mistake, I am on what the post title states. It is not reporting correctly on the dashboard and when I try to update it gets to extracting the filesystem then just shuts off. I have my capture card plugged into the server and it gives no indication it’s about to crash. Debug shows nothing of value. Syslog is corrupted and cleaned everytime it reboots.

I have set up another system and it seems stable, I have not tried to import a pool from the first system yet to test. Not sure it’s worth it.

Was able to replicate files from non encrypted to encrypted file path on the new. Am leaning towards cpu but would really like some material evidence rather than, I do things and it reboots.