Howdy, Forum!
I’m a newbie working to convert from Synology Diskstation to Truenas Scale 24.10.0.2 on home lab hardware (bare metal). I have recently begun attempting to migrate my media library using rsync over ssh, which is when the crashes started happening. After rigorous testing, I’ve found that I can pretty reliably force a system reboot after copying between ~64 and ~80 gigabytes to separate file handles. Examples below. I’m not able to find an error in the system logs. My box has a BMC IPMI interface, so I set up a screen recording on the virtual terminal to see if anything got thrown to console, and the answer was ‘no’. It goes straight from a running system to POST with no warning.
Some tests I’ve run (using my desktop’s /dev/urandom as a data source):
Streaming data over SSH to a single file without limitation - Reboot after writing ~70 GB
Streaming data over SSH to multiple files with sizes between 1 and 20 GB - Reboot after writing ~70 GB
Streaming data over NFS to multiple files with sizes between 1 and 20 GB - Reboot after writing ~70 GB
Streaming ~5 GB data over SSH to multiple files, with 2 minutes sleep between each copy - Reboot after writing ~70 GB
Streaming ~5 GB data over SSH to multiple files, with 60 minutes sleep after 10 files - Reboot after writing ~70 GB
Streaming data over SSH to a single file, writing/overwriting between 1 and 20 GB to the same file handle - No reboot, halted test after ~100 GB
Tests I want to run but haven’t gotten to yet:
“Stress”-testing READ rather than WRITE
Local copy (Dataset to dataset, and/or from usb drive)
Target datasets are on a pool that has 4 physical SSDs, comprising 2 mirrored vDevs, all assigned to ‘Data’.
I love weird edge-case problems like this, so I’m game to keep tinkering if anyone has any ideas/suggestions. I’m also open to trying CORE instead of SCALE, after assessing any caveats. I mainly picked SCALE because Linux is in my wheelhouse – but I don’t think it matters much when dealing with an Appliance-grade OS.
Edit: Including hardware specs, per @SmallBarky
Proc: AMD EPYC 4464P
Board: AsRockRack B650D4U (Using on-board NICs and SATA)
RAM: 2x 32 GB ECC Unregistered Unbuffered DIMM
Boot Disks: 2x 500GB NVME
Data Pool: 2x Samsung SSD 870 (4TB), 1x TEAM T2532TB (2TB), 1x SPCC Solid State (2 TB)