Random reboots: best practices to debug/understand what is happening?

The thread was about confined DDR5 memory modules getting hot and causing errors. When cooled sufficiently or downclocked the errors went away.