I’m a very new server and VM user. After rulling out that the problem isnt happening on Proxmox (all VMs turned off) or when Truenas is virtualized (only Truenas VM running, all apps stopped), I noticed that Jellyfin and Home Assitant would keep crashing the entire server. Basically the crash would happen after only a couple minutes to hours. The server did not crash after several days with only Proxmox running, or when a Truenas VM was running on proxmox. Apps running seems to be the issue. First when I tried to install Home Assitant, the server immediately crashed and I couldnt get anything to load ever again, leading to a fully reinstall of the OS and VMs. I thought maybe this was a hardware issue at first but I’m fairly certain the problem is in software. When I have Jellyfin running the issue would persist again but the server would load up when I reset the computer. Don’t know if this issue is related but I also now have a zfs-import-scan failed issue when Proxmox first loads but then goes away and Proxmox loads up. What in my apps setup could be causing this issue? Perhaps the CPU and Ram settings? Not sure how to get logs or diagnostic info but I’m willing to learn and go dig for it. Thanks for any help for a noob!
Motherboard: ar900i Minisforum - 13900HX (24x physical cores, 32 logical cores, the bios on this thing sucks, I thought the it might have been idle power issue but the only setting I could change was disabling C-States)
Ram: 64Gb Crucial DDR5 (2x 32Gb) 5200 mhz
Proxmox Main/Boot Drive: Samsung 990 Pro 4Tb
4x Crucial P3 Plus 4Tb
m.2 NVME to Oculink Adapter
4060 Lp Gigabyte (PCIe Passthrough to Windows 11 VM, located on outside of short server rack chassis)
PSU: Corsair SFX 750W Plat
QNAP 10Gbe and 2x m.2 NVME adapter card (QM2-2P10G1TB)
WD 14Tb EasyStore External USB (Proxmox Backup Drive)
3 of the Crucial Drives are dedicated to Truenas in RaidZ1 for storage (VM on main Proxmox drive)
1 of the Crucial drives is a secondary drives to Windows 11 VM
In total 3 drives (990 Pro and 2x P3 Plus) connected to ar900i with the oculink adapter to fill all 4 m.2 slots on the mobo. The last 2 (P3 Plus) drives are connected to the QNAP card
3 VMs total, all on the Proxmox drive: Truenas, Windows 11, Ubuntu
Truenas Apps: Pihole, Nginx, Jellyfin, Plex, and Handbrake. Not sure if I had any issues with the other apps besides Jellyfin and HA
VM Memory Allocation: 16Gb to Truenas, 16Gb to Windows, and 16Gb to Ubuntu
VM CPU core allocation: Truenas - 8 cores, WIndows - 2x sockets, 8 cores, 16 logical cores, Ubuntu - 4 cores)
Truenas Apps Memory and Cpu Allocation: mem default setting 8gb? cpu default 4000?
Also as far I knew I thought I had only 1 boot-pool but I think theyre may be traces of files from the time the first setup crashed and had to start over again. I had deleted the second zpool but the problem still happened
It’s weird that the crash is so hard you need to reinstall the Proxmox host to get the system back up.
Not sure what the cause would be in this scenario other than to suggest that you consider what sets Jellyfin and Home Assistant apart. Do they use specific hardware or devices (GPUs, possibly other things)?
Perhaps there is some form of HW instability that is only triggered when either of those apps use it, otherwise laying dormant.
Have you verified that the RAM is okay with something like memtest86?
Does anything change if you remove the resource limits on the apps in TrueNAS?
Well I wasnt getting display out at all no matter what I tried after the HA crash. So I cleared the CMOS, which didnt work. I then booted into BIOS with a proxmox img just to see if I could get display out and that worked. Problem occured after that and the server said no OS installed. Not sure if loading the img maybe caused an issue. I didnt go to the install screen so that was weird.
When I had everything back up and running I tried installing HA once again. The server than immediately locked up and crashed. I was lucky that when I reset the server this time it booted back up. From then I decided not to install HA. Once I had everything set up again is when I discovered the system was only stable for a short time when Jellyfin was running. So far I have truenas running with the other apps with no issues with Jellyfin not running.
How do I remove the resource limit on the apps? I think I tried deleting the value but that gave me an error and wouldnt let me save the apps configuration. Im at a loss. I do think it may have something to do with resource allocation but Im not sure what to set everything too. I thought these apps didnt need much cpu or ram?
They are directed attached NVMEs. The mobo has 4 m.2 slots, 2 on the top (PCIe 5.0) and 2 on the bottom. The last two nvmes are attached via PCIe x16 slot on the mobo using the NIC/m.2 card.
The proxmox drive is attached to one of the top m.2 slots on the mobo. The oculink m.2 adapter is attached to the second one.
If you need pictures let me know. I think I was wrong in that the server may not be crashing because I recently attached a portable monitor and Im still seeing the proxmox shell terminal working and stupidly not realizing the main cpu fan is still spinning. So it may actually be the server is fine but that for some reason the ethernet connection keeps dropping. Sorry for being an idiot and not realizing that sooner. I caught covid the last couple days so my brain wasnt all there lol. I might buy another NIC card to see if that may be the culprit.