Hi,
I had a lot of fun yesterday (not) when moving from “Find out why box crashes on data deletion” to a “Truenas crash on boot” loop fix.
Long hours turned short, it seems that the metadata devices on my main pool have some issues, so heavy io to them (data deletion) caused system reboot.
Multiple attempts in isolating the root cause (swapping hba, places in the cahssis/backplanes, psu’s) later one vdev of my 3 (3x2 mirror) metadata devices decide to act up even more and cause reboot on import already (instead of on write only).
At this moment i imported the pool read only and copy off as much as I can before recreating it (o/c i don’t have a backup since i am/was in the middle of deduplicating 3 backup servers into one new big main pool which includes much dish shuffling and left me with no backups. Not smart in hindsight).
Pool layout
pool: tank18t
state: ONLINE
scan: resilvered 2.09M in 00:00:00 with 0 errors on Sun Apr 19 21:17:52 2026
config:
NAME STATE READ WRITE CKSUM
tank18t ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
7d9a5851-c812-4f77-991e-eeef004b70c7 ONLINE 0 0 0
08ce322c-423b-4620-a298-5caf1083f623 ONLINE 0 0 0
82be3c4b-6def-4c30-83be-20e224d10a27 ONLINE 0 0 0
9af43f3a-ea81-4559-925a-74eb744cddbe ONLINE 0 0 0
fcfff8fa-9218-49cf-9081-9b2470be5908 ONLINE 0 0 0
79d77d63-6fc0-46b2-8cbb-8c96553ededa ONLINE 0 0 0
619ce353-29c0-498e-bb8f-8c8564a7387b ONLINE 0 0 0
4a2a1378-9bf7-4ea9-bacf-92b4094605e8 ONLINE 0 0 0
special
mirror-1 ONLINE 0 0 0
a65f45a9-daf9-437e-aeb4-3a8ddc20db7a ONLINE 0 0 0
921db10e-a1f8-4bc4-b48f-f2b535002af5 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
7f48692b-2ff3-435e-ae56-e9f3bd6eb852 ONLINE 0 0 0
7b00d031-35bb-48eb-93e2-51a16ad87ad3 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
244f25cc-be3a-409d-a670-fb21d0ed49fc ONLINE 0 0 0
da55dab6-ce08-4d64-b449-fdc1d3f9ba7d ONLINE 0 0 0
errors: No known data errors
Now, with that background to my actual point of discussion - is there any other explanation of defective drives that could cause the system to reboot upon writes with out a log entry?
I’ve
- swapped out the PSU for a bigger one (500W to 1Kw), it probably was a bit underpowered (10 sata spinners , 25 sata ssds, a2000, X12SCZ-F/1290p, a X710 Nic. an m2 nvme, in an 847 chassis)
- PDB is the same but it should be fine here , its designed to run 36 spinners after all
- I moved the drives between front and back backplane (EL1’s) and onto different slots
- I swapped out the hba (930016i to 24i back to the 16i)
- I moved everything to another system (846 on older hw) but only after the reboot loop started
The drives (Micron 5100 3,84) were used in a previous build for a year without anything noteworthy, all drives have same fw, no known issues with any of the others i have
Short smart shows up fine. will run longer test after I evacuated data
Looking for ideas why this happens…
Bad luck putting two defective drives in a single mirror vdev? Or other issue that only triggered the chain of events…
Edit: I need a way to identify why those drives act the way they did so i can test others to see if they have the same issue before i use them as metadata device on the next pool…
Thanks
p.s. I am on 25.04.2.6, an initial 25.10 attempt didnt go so well so I decided to let that mature some more. Although the A2000 doesnt seem to work anymore on the latest update either but thats another issue)