I have 5 Dell PowerEdge R350 servers, each with a Zeon E-2314 CPU (2.80GHz), 8GB RAM, a pair of 256GB SSDs on the BOSS-S2 card for the OS, and a PERC H755 with 4x 2TB 6Gb SATA drives (Seagate ST2000NM012B-2TD) in JBOD (RAIDz1). Running TrueNAS Core 13.0-U6.7.
It seems like, whenever I have a large file transfer, it slowly grinds to a halt and locks up, the screen shows a bunch of plugin_dispath_values low water mark reached dropping 100% of metrics errors.
Once it’s dead, I have to do a warm reset, followed by hours of zio_deadman(): zio_wait waiting for hung I/O to pool messages. After that, the system comes back up as expected.
All 5 of my devices are doing this, and have done this with some regularity since the day I set them up.
What logs do I need to check, to see what the heck is going on?
Thanks!