Nas random hangs

Dear,

First I had a Nas/Server running on ubuntu. After some research, Truenas was a better OS for what I do with it. But in transferring, my new motherboard broke so I used a temporary intel motherboard with matching CPU. This went well but didn’t have the power I needed (also the ram was a bit on the low side).

Now I updated to an AMD ryzen variant (spec below) and I was very happy with it in terms of performance and power consumption, however I regularly get no response from my server and it spits out weird error messages (as below) if it gives error messages at all, sometimes my screen is just zward.

Anyone have an idea what this could possibly be or how I can debug this?

Thanks in advance

OS:TrueNAS-SCALE-24.10.2.1
Model:AMD Ryzen 5 1600
RAM:16 GiB (2 stick)

(i can not upload photo’s)

[98175.079040] ? nmi_handle+0x61/0x150 [98175.079046] ? default_do_nmi+0x40/0x100 [98175.079050) ? exc_nmi+0x125/0x1a0
[98175.079053] ? end_repeat_nmi+0x16/0x67
[98175.079061] ? smp_call_function_many_cond+0x11e/0x4f0 [98175.079064] ? smp_call_function_many_cond+0x11e/0x4f0 [98175.079067] ? smp_call_function_many_cond+0x11e/0x4f0 [98175.079070]
[98175.079071]

[98175.079072] ? __pfx_flush_tlb_func+0x10/0x10
[98175.079078]

on_each_cpu_cond_mask+0x24/0x40
[98175.079080]
[98175.079084]
[98175.079089]
flush_tlb_mm_range+0x105/0x150
flush_tlb_batched_pending+0x40/0x60
unmap_page_range+0x4ae/0x10c0
[98175.079093] ? __call_rcu_common.constprop.0+0xe5/0x6b0
srso_return_thunk+0x5/0x5f
unmap_vmas+0xb5/0x190
unmap_region.constprop.0+0xe3/0x160
do_vmi_align_munmap+0x33b/0x4c0
do_vmi_munmap+0xdc/0x170
[98175.079097]
[98175.079104]
[98175.079110]
[98175.079119]
[98175.079127]
[98175.079131]
[98175.079137]
[98175.079139]
[98175.079142]
? do_syscall_64+0x65/0xb0
[98175.079144]
entry_SYSCALL_64_after_hwframe+0x78/0xe2
__vm_munmap+0xa4/0x150
__x64_sys_munmap+0x1b/0x30
do_syscall_64+0x59/0xb0
[98175.079147] RIP: 0033:0x7f7557b3f8f7
[98175.079150] Code: 00 00 00 48 8b 15 09 05 0d 00 f7 d8 64 89 02 48 c7 co ff ff ff ff c3 66 2e of f 84 00 00 00 00 00 66 90 b8 ob 00 00 00 of 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 04 00 00 f7 d8 64 89 01 48
[98175.079152] RSP: 002b:00007f755021fa98 EFLAGS: 00000246 ORIG_RAX: 000000000000000b [98175.079155] RAX: ffffffffffffffda RBX: 00007f7543dd6000 RCX: 00007f7557b3f8f7 [98175.079156] RDX: 00007f7557b34c40 RSI: 0000000000009000 RDI: 00007f7543dd6000 [98175.079158] RBP: 0000000000009000 R08: 00007f755021fb30 R09: 0000000000000000 [98175.079159] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000000bcd87 [98175.079160] R13: 00007f755021fdac R14: 00007f755021fb30 R15: 0000000000000000 [98175.079166]
BenQ
FP71G
Tro 03
DISPLAYS

Did you do the bios tweeks required for 1st gen ryzen?

For older bios versions:

  • Disable AMD cool&quiet
  • Disable erp-ready
  • Disable global c-states

For newer bios versions:

  • set psu power control to typical current idle from low current idle

It might be this???

It looks like some great advice has been given, however you should run both a CPU stress test for a while followed by a RAM test for several complete passes. This is just to give you a little peace of mind.

Best of luck troubleshooting this problem.