Specifically, the issue appears to be RAM. I would start by adding up all of the memory you have allocated to each of your Apps plus VMs. What is that sum? That should represent the worst case scenario.
Subtract from 64 and how much is left?
edit: Can you also share the output of lspci -vv
specifically for the section of your iGPU? Also cat /var/log/messages| grep GTT
I dont have an Intel iGPU so your logs may appear a little bit differant. In my case, it appears the Linux kernel is dynamically allocating as much as 16GiB for my AMD iGPUā¦This doesnāt mean it WILL use that much, just that it CAN. If youāre using it for Frigateā¦yeahā¦that certainly can be part of your problem.
root@prod[~]# cat /var/log/messages| grep GTT
Nov 4 10:41:57 truenas kernel: [drm] amdgpu: 15741M of GTT memory ready.
Nov 11 12:08:07 prod kernel: [drm] amdgpu: 15741M of GTT memory ready.
Nov 23 21:49:12 prod kernel: [drm] amdgpu: 15741M of GTT memory ready.
Dec 20 22:01:47 prod kernel: [drm] amdgpu: 15743M of GTT memory ready.
Dec 20 22:13:12 prod kernel: [drm] amdgpu: 15743M of GTT memory ready.
root@prod[~]#
10:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c9) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 168
IOMMU group: 4
Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at e0000000 (64-bit, prefetchable) [size=2M]
Region 4: I/O ports at e000 [size=256]
Region 5: Memory at fce00000 (32-bit, non-prefetchable) [size=512K]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [c0] MSI-X: Enable+ Count=4 Masked-
Vector table: BAR=5 offset=00042000
PBA: BAR=5 offset=00043000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [270 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [2b0 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [2c0 v1] Page Request Interface (PRI)
PRICtl: Enable+ Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00000100, Page Request Allocation: 00000020
Capabilities: [2d0 v1] Process Address Space ID (PASID)
PASIDCap: Exec+ Priv+, Max PASID Width: 10
PASIDCtl: Enable- Exec- Priv-
Capabilities: [400 v1] Data Link Feature <?>
Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [440 v1] Lane Margining at the Receiver <?>
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Iām not an expert here, but I read this in my example as the GPU using a minimum of 256M+2M+512K with a maximum available of 16G.