Strange reboot on Scale (24.04.1) - error log empty

I have strange reboots on two (similar) machines when doing the following:

Using a windows PC I copy a folder on the share into another folder on the same share.
Let’s say the share is data1 the from truenas.local/data1/tmp1 to data1/tmp2.

Truenas itself will not make real copies but references (you can see the storage size growing) so good so normal.
After a tad amount of Data (for me between 1-6TiB) the copy stops because trunas screen (the one attached to the machine not the webinterface) goes black. Then the machine reboots.
In the logs there is no sign whatsoever what happened. You can see the boot process but not what caused the reboot. There is nothing at the exact timestamp. I tried several times now and since I have a second board/cpu/ram/pool I tried that too. Same result.

When copying data to truenas via Ethernet (from another machine to the truenas share) nothing strange happens (copied >40TiB so far).
Also if I use cp to do the same what I described above (copy tmp1 to tmp2) I was able to copy the whole amount of data (10s of terabytes). Multiple times.
It’s not the psu (which was my first guess). And I used the inbuilt nic and the nic card. Still the same.
This is pretty strange. I am able to reproduce it now and still have no idea what’s going on.

Machine 1 is:
Intel S1200SP, Xeon E3-1240v5, 16GB ECC Ram, Corsair 550W PSU, Melanox ConnectX4
Machine 2 is:
Intel S1200SPL, Xeon E3-1240v6, 64GB ECC, 650W Seagate Focus PSU, LSI 9211-8i, Melanox ConnectX3

Maybe someone can try on their machine (sometimes it takes two attempts when the copied amount of data is not enough - I can reproduce it with 10TiB 100% of the time. Happens somewhen in the process).

In the meantime I am trying to install Core and verify if it is gone. Which takes some time because the pool from dragonfish cannot be imported in core U6. Seems like feature mismatch with zfs or so. So I have to fill a pool with files.

1 Like

Update:
Tested the same on one Machine with: TrueNAS CORE 13.0-U6.1
I am doing copy action on the share since 2h and the machine is still running. Seems to work.
Next testsubject is back to Scale 24 to test if the pool shows the aforementioned problematic (crosscheck) and then test again on Scale 23.10

You should report a bug… if you can make the machine available for our engineers to resolve.

It seems like a server side copy issue… 24.04 does this differently.
Can you post the NAS-ticket ID and make sure you provide a way of replicating if you can.

1 Like

First I noticed copy on Scale 24 Shares via shell or windows results in no shrinking of available space (I have not activated dedup) while it actually reduces free size in Core U6. I guess this is the new feature

I have a lot todo, so I cannot fully test everything and I must look what I can provide. I wanted to use it sometime productively so it actually bothers me a lot I cannot start to copy my productive data to the pool, which I must start the next days. So if testing today confirms Scale 23.10 working, I will use this in the meantime.

For testing you can use this:

1.) Install Scale24
2.) Create a Pool under Scale 24 (does not matter if z2 oder mirror).
3.) Create a smb share and activate smb service.
4.) The copy a lot of Data to it (best around 4-10TiB).
5.) Use windows explorer from a machine to copy inside the same share from one folder to another (copy not move) from one folder to another empty folder.
6.) watch the Monitor of the truenas metal going blackscreen and reboot

Edit: I now filed a bug report.
Edit2: The pool from CoreU6 i imported on Scale24 works. (I didn’t upgrade). This (in my opinion) points to zfs featureset problems. Speed when copy is much slower (600MB/s vs 2-3 GB/s). May also explain why not many encountered it since I would not upgrade the pool on updating/upgrading Truenas for now.
Edit3: NAS TIcket ID: NAS-129463

Yes, it is block cloning. Its a special type of dedup where copies are made through metadata without new physical copes of the data.