Truenas Crashing when I generate significant loads

Currently air cooling with a deep cool AK500. The main time I spike usage is using next cloud to search through historic photos or when my Plex does something like a big scan/analytics. Since the bios update this morning there hasn’t been any unexpected reboots I will run the prime95 tomorrow during the day.

I ended up buying the Supermicro 9300-8I 12Gb/s LSI 9300-8i and another 9 fans for more cooling. The fans should be here tomorrow and the hba next week or the week after so plenty of time to keep testing the stability. Once I get it and it’s all stable I will migrate the data from my ironwolf4s pool to my new ironwolf pools with a rsync command and reset up my shares. I’m unsure what to do with the old drives tempted to use them until failure.

1 Like

I bought a whole replacement set once my original drives (also bought used) were more than 5 years old. I have had one HDD failure since. But at $76 for a 10TB drive + a 5 year warranty, I consider that cheap insurance, especially after qualifying the drives with badblocks, etc.

If the drives are too low-capacity for NAS use, you can always set them up in an external RAID box for off-site backup use for critical NAS data.

This is what we’re imagining

IMG_1415

3 Likes

My NAS is screaming, I resemble that statement!

Very close this is with the side panels off and on. Note left side is drives right is the system.




There is also like 5 pages in the manual dedicated to air cooling here is a snipit of its future haha.


I just had this happen to me as well. For me I found that any time the system was under a load the system would shut off and restart and I could trigger exactly when this crash would happen by uploading a large amount of test files. My issue was the power supply which was a thermaltake PSU was weak. From what I can see in your setup it is very close to mine as well. Not exactly but very close. I wasn’t getting any error messages or anything in the OS besides just a sudden restart. I ended up going with a corsair PSU and so far the system has been stable for well over 24 hours and I have put several heavy loads to the drives. Hopefully this gives you somewhere to go. I delt with mine for over 3 weeks before I finally had enough.

Yea I am at 2 weeks with mine but since updating the bios I haven’t had any trouble yet even doing tasks that would typically cause it to restart. I’m. Letting it run prime95 today to validate the stability.

I had run Prime95, Memtest, and a whole ton of other tests and they all passed. So I was thinking that I had an overclocking profile set. Checked the BIOS and none set. So from there I did a clean install of Dragonfish and installed Immich or whatever it is called and so far so good. I started to upload my photos from my phone to the server and Click then restart. I even removed the UPS thinking that maybe it had gone bad on me. So I tried to do the transfer yet again and then crash. The issue I had found is that I had miss calculated my power draw. The thermal take was 500 watts and the system was needing 593 watts to be stable. Also the thermal take may have been damaged in the last round of storms we had here in Texas even though it was on the UPS. Rare but can happen.

I recalculated the power draw of the processor then added 50 watts for the motherboard and then 15 watts for every drive in the system. Don’t forget about the fans. They are about 5 watts each. DDR5 Takes up a good chunk of power also so I allocated for that also. I’m just now getting started in the NAS game and boy is it a learning curve at first.

Funny part is none of the power issues showed up until I upgraded to the Dragonfish version. Which I am thinking since it allocated the ARC a bit differently changed my power requirements but can’t be positive of that. But sure does seem like it.

Glad to hear you are running and stable. Hopefully you stay that way. Do you spin down your drives? I just started spinning mine down since I mainly use the NAS on the weekends for plex.

I have never spun my drives down I have seen a lot of mixed reviews about the drives spinning down. Because I have family on the east and west coast of aus accessing my nas at ransom times I find it better to leave everything up.

I have a 750 watt psu, the cpu maxes 220 turboing 180 for all my drives 50 for the motherboard and all the fans is claim to be 1.4 v so assuming my new total of 20 is 28-30w so that leaves 220 for the sata expansion card and the ram I think I have enough overhead.

I have been running prime95 most of today I noticed that when I tried interacting with my running apps (stopping them/starting them) it caused prime95 to just stop with the message zsh cancelled. But that was after I was running it for 2 hours.

I’m now at 3.5 hours on my second run using blended mode with 100% on all 28 threads and still sitting at 40-42 degrees on the cpu so I am pretty happy with my air coolers performance. In total it’s drawing between 178-185v from the wall. Going to let it keep running to a full 8 hours but I am pretty confident bar that one hickup during the test that it’s stable now.

1 Like

For reference, Proper Power Supply Sizing Guidance | TrueNAS Community is still valid.

@jgreco put together a great guide and there are some very good observations such as how systems will spin up drives. Better backplanes / systems will sequentially stagger the spin up drives with small delays in order to minimize the stress on the PSU.

While the measured plug load was never higher than about 125W for my system, my 750W PSU still croaked after about 5 years of use. Seasonic replaced it in a week, now I keep a spare.

Whether my motherboard can stagger drives or not is an open question. Electrically, all the drives are on the same power bus so the motherboard / TrueNAS would have to do so via SATA commands. I seem to remember my Synology doing the stagger during boot, ie watching one drive LED after the other come up as I heard the disks spin up.

1 Like

12v rails come into play. I remember from back in my overclocking days of pentium PresHOTS.

putting too many drives on a single rail if shared can cause a trip due to the amps being pulled through and voltage spiking too high.

1 Like