Truenas SCALE falling over after large transfers (with intel nic)

Was doing pretty good yesterday, stayed up all day even with plex installed, streamed the entirety of Dune Part Two with no crash, and then randomly it reboots at 3am again today :smiling_face_with_tear:

If there is no hardware watchdog, then we can only assume that something in TrueNAS is triggering a reboot, but I have no idea what it might be.

But if there is nothing in the files in /var/log that suggests a cause then IMO perhaps you need to try to turn off this reboot and attach a monitor so that when it hangs or crashes in the middle of the night any console messages stick and can be viewed in the morning.

Some more syslogs, is it maybe my disks? but this is weird though since the metrics dropout seems to be while the system was still logging

circling back again, still experiencing the same random reboot issues. I have decided to try an insane step of swapping both the motherboard and CPU with another set that i have.

It might not be, but i started to consider that maybe the board was the problem since i have heavily used it for dev purposes, including several cpu swaps, and theres a possibility i might have stressed the board when i removed the original cpu fan.

Updating again, mostly just to keep track.Unset app pool, and system has stayed up for 5 days and counting. Maybe i was clogging the disks. I have a new board with M2 slots coming so Iā€™ll move the app pool to one of those and try again.

You appear to have a pair of failing drives, these log messages appear right around the time of the issues you reported based on the drop out of the metrics and the log entry where the system is restarting.

You have a single vdev 3 disk raidz1, where two drives failing could indeed cause system instability.

2 Likes

Good spotting.

More to the point, when you lose the first drive and resilver, the stress may well cause the 2nd drive to fail, which will result in total loss of data in the pool.

IMO as a matter of URGENCY @coffeekomrade needs to:

  1. Make a backup immediately of all the data in this pool (or certainly any critical data they cannot afford to lose).

  2. Buy two new drives and swap them out one-by-one, and hope that the 2nd drive doesnā€™t fail during the 1st resilver. Or if they have enough slots, they should resilver with the extra drives alongside.

1 Like

I was worried about that, i had thought maybe it was my disks but didnt think it was 2 at once. I ordered 3 new 4TB drives to swap out the 3x 2TB current drives, luckily this machine is just mirroring my synology for data so even if both disks go at once i wont lose anything.

Iā€™ve got a lot of parts coming in the next week :sweat_smile:

Drives have been replaced, new motherboard has been installed (I needed those additional slots and M2s), things appear good.

Waiting on a LSI 9300-16i HBA to be delivered to add to it

Speaking of, does anyone know a good place to get white label HDDs? Iā€™ve seen GoHardDrive, and they have decent pricing but a lot of refurbed drives. Iā€™ve never bought refurb so not sure what to expect there.

If you donā€™t want refurb, look for Enterprise drives (WD Gold=HGST Ultrastar, Seagate Exos, Toshiba MG) rather than ā€œwhite labelā€ consumer-grade drives: Pricing is usually better than dedicated NAS lines (WD Red, Ironwolf, N300).

Iā€™m open to refurb, just not sure what to expect. I see a lot of refurb enterprise drives, mostly HGST ultrastars, for between $35-60 for 3-4TB drives. I wouldnā€™t mind picking up 5x 3TB for $35/ea to be honest

I like goharddrive.com because Iā€™ve had to make use of their warranty before and it was less painful than that of OEMs. No downloading seatools or whatever. Just report the drive, put it in the mail and another refurb comes your way.

Far less drama and really good prices vs. buying new or taking your chances shucking drives. That said, my preferences are influenced by the desire to run He drives to save on heat and power. Theyā€™re pretty rare in the 10TB capacity new and theyā€™re certainly not available with 5 year OEM warranties.

Thus, I prefer buying a refurb from goharddrive.com with a 5 year warranty than a new one with just a 3-year.

1 Like

I think thats what im gonna do, especially now that i picked up a 16i hba

1 Like