SMB service randomly turns off due to going OOM

Edit (Possible Solution): As recommended by Stux, I upgraded to 24.0.4.2. I did some stress testing and I didn’t run into the OOM problem so far. I don’t have a way to reliably replicate the OOM error so I’ll just leave it like that for an see if I run into it again.

If I do awalkerix also recommended either disabling AIO writes for the SMB share or upgrading to EE. So I’ll try these two solution if the first one didn’t work. I’ll also definitely upgrade to EE when the stable version is out.

Hi,

Since I moved to TrueNAS a few months ago, I have been experiencing an issue where I randomly lose access to my SMB shares and / or my TN web management interface not being accessible. Usually the web interface becomes accessible again after a few minutes, but when the shares go down, I have to restart the SMB service in TN. Sometimes both go down at the same time, but sometimes only one or the other go down.

At first I thought it was a network issue, but through the process of elimination I narrowed it down to TN killing the services due to going OOM (OOM Killer). It seems to happen more during write intensive tasks such as when I backup my Proxmox VMs to my backups SMB share, but it’s not always the case. Sometimes it happens at night when there isn’t much going on.

I have multiple nodes accessing the storage at the same time, but nothing extreme. Usually, there might be 3-5 concurrent connections. At worst, 10 connections, but that’s extremely rare like once a year and even then.

Has anybody experienced such a problem? I’m not quiet sure what to do to fix it. Any help would be greatly appreciated.

If I could change setting to stop this from happening that would be perfect.

If not I could add another 32gb ram, but I’m not sure if that would fix the issue or just fill up as well. My performance is fine right now so I would rather not spend the money on it if it doesn’t fix the issue.

Thank you

My specs:

  • TN Dragonfish-24.04.1.1 in a Proxmox VM (That only runs this VM nothing else)
  • TN Scale is only used as a NAS, there a no VMs, Containers or other Services running
  • 28GB ram ddr4 3200 (16GB x 2 = 32GB for the Proxmox Hypervisor)
  • 6 cores (i7 6700k with 8 cores for the Hypervisor)
  • 32GB Disk Space (running on a WD 850x nvme ssd)
  • LSI 9207 HBA (passed through)
  • NICs: -
    • 40Gbpe Mellanox Connectx-3 via VirtIO
    • 1G intel nic via VirtIO
  • Storage: (zfs dedup off)
    • 6x 10TB raidz2
    • 2x 12TB stripe (20-60MB constant reads)
    • 2x 500gb mirror
    • 1x 4tb (constant 20MBs writes, CCTV)
    • 1x 1tb

Here is the messages log that says that OOM Killer killed my SMB service. This time it happened when I was backing up all my VMs.

Sep 9 03:01:44 truenas kernel: smbd[10.0.101.7 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Sep 9 03:01:44 truenas kernel: CPU: 0 PID: 122092 Comm: smbd[10.0.101.7 Tainted: P OE 6.6.29-production+truenas #1
Sep 9 03:01:44 truenas kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2023.08-4 02/15/2024
Sep 9 03:01:44 truenas kernel: Call Trace:
Sep 9 03:01:44 truenas kernel:
Sep 9 03:01:44 truenas kernel: dump_stack_lvl+0x47/0x60
Sep 9 03:01:44 truenas kernel: dump_header+0x4a/0x1d0
Sep 9 03:01:44 truenas kernel: oom_kill_process+0xf9/0x190
Sep 9 03:01:44 truenas kernel: out_of_memory+0x256/0x540
Sep 9 03:01:44 truenas kernel: __alloc_pages_slowpath.constprop.0+0xb23/0xe20
Sep 9 03:01:44 truenas kernel: __alloc_pages+0x32b/0x350
Sep 9 03:01:44 truenas kernel: folio_alloc+0x1b/0x50
Sep 9 03:01:44 truenas kernel: __filemap_get_folio+0x128/0x2c0
Sep 9 03:01:44 truenas kernel: filemap_fault+0x5df/0xb60
Sep 9 03:01:44 truenas kernel: __do_fault+0x33/0x130
Sep 9 03:01:44 truenas kernel: do_fault+0x2b0/0x4f0
Sep 9 03:01:44 truenas kernel: __handle_mm_fault+0x790/0xd90
Sep 9 03:01:44 truenas kernel: ? do_syscall_64+0x65/0xb0
Sep 9 03:01:44 truenas kernel: handle_mm_fault+0x182/0x370
Sep 9 03:01:44 truenas kernel: do_user_addr_fault+0x1fb/0x660
Sep 9 03:01:44 truenas kernel: exc_page_fault+0x77/0x170
Sep 9 03:01:44 truenas kernel: asm_exc_page_fault+0x26/0x30
Sep 9 03:01:44 truenas kernel: RIP: 0033:0x7f116be8bfff
Sep 9 03:01:44 truenas kernel: Code: Unable to access opcode bytes at 0x7f116be8bfd5.
Sep 9 03:01:44 truenas kernel: RSP: 002b:00007ffd3c0a2560 EFLAGS: 00010202
Sep 9 03:01:44 truenas kernel: RAX: 0000000000000001 RBX: 00005649df535e70 RCX: 00007f1170cd0e50
Sep 9 03:01:44 truenas kernel: RDX: 00007f116be90cd0 RSI: 00007f1170cd329b RDI: 00007f1170cd66fa
Sep 9 03:01:44 truenas kernel: RBP: 00005649df4c9030 R08: 0000000000000000 R09: 0000000000000073
Sep 9 03:01:44 truenas kernel: R10: 0000000000000000 R11: 00000000ffffffff R12: 00007ffd3c0a2760
Sep 9 03:01:44 truenas kernel: R13: 00007ffd3c0a2680 R14: 00005649df51dea0 R15: 00005649df9cc230
Sep 9 03:01:44 truenas kernel:
Sep 9 03:01:44 truenas kernel: Mem-Info:
Sep 9 03:01:44 truenas kernel: active_anon:5517 inactive_anon:521089 isolated_anon:0
active_file:11 inactive_file:47 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:7244 slab_unreclaimable:666726
mapped:5934 shmem:7197 pagetables:2757
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:153262 free_pcp:230 free_cma:8351
Sep 9 03:01:44 truenas kernel: Node 0 active_anon:22068kB inactive_anon:2084356kB active_file:44kB inactive_file:188kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:23736kB dirty:0kB writeback:0kB shmem:28788kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:1038336kB writeback_tmp:0kB kernel_stack:10896kB pagetables:11028kB sec_pagetables:0kB all_unreclaimable? no
Sep 9 03:01:44 truenas kernel: Node 0 DMA free:3132kB boost:0kB min:16kB low:20kB high:24kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:7728kB managed:7228kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 9 03:01:44 truenas kernel: lowmem_reserve: 0 1868 27951 27951 27951
Sep 9 03:01:44 truenas kernel: Node 0 DMA32 free:108452kB boost:0kB min:4516kB low:6428kB high:8340kB reserved_highatomic:0KB active_anon:16kB inactive_anon:611684kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2050944kB managed:1985260kB mlocked:0kB bounce:0kB free_pcp:852kB local_pcp:0kB free_cma:0kB
Sep 9 03:01:44 truenas kernel: lowmem_reserve: 0 0 26083 26083 26083
Sep 9 03:01:44 truenas kernel: Node 0 Normal free:519732kB boost:174696kB min:237744kB low:264452kB high:291160kB reserved_highatomic:260096KB active_anon:22052kB inactive_anon:1472616kB active_file:152kB inactive_file:204kB unevictable:0kB writepending:0kB present:27262976kB managed:26717132kB mlocked:0kB bounce:0kB free_pcp:152kB local_pcp:0kB free_cma:33404kB
Sep 9 03:01:44 truenas kernel: lowmem_reserve: 0 0 0 0 0
Sep 9 03:01:44 truenas kernel: Node 0 DMA: 14kB (M) 18kB (M) 116kB (M) 132kB (M) 064kB 0128kB 0256kB 0512kB 11024kB (U) 12048kB (M) 04096kB 08192kB 016384kB 032768kB 065536kB = 3132kB
Sep 9 03:01:44 truenas kernel: Node 0 DMA32: 235
4kB (UME) 1798kB (UME) 12916kB (UME) 8632kB (UME) 4364kB (UME) 59128kB (UME) 32256kB (ME) 42512kB (UM) 241024kB (UME) 162048kB (M) 14096kB (M) 08192kB 016384kB 032768kB 065536kB = 108628kB
Sep 9 03:01:44 truenas kernel: Node 0 Normal: 78974kB (UMEHC) 17928kB (UMEC) 53816kB (UMEC) 142732kB (UMEHC) 32864kB (UMEC) 342128kB (UMC) 1108256kB (UMC) 80512kB (UMC) 71024kB (MC) 72048kB (C) 24096kB (C) 08192kB 016384kB 032768kB 0*65536kB = 519268kB
Sep 9 03:01:44 truenas kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 9 03:01:44 truenas kernel: 7449 total pagecache pages
Sep 9 03:01:44 truenas kernel: 0 pages in swap cache
Sep 9 03:01:44 truenas kernel: Free swap = 0kB
Sep 9 03:01:44 truenas kernel: Total swap = 0kB
Sep 9 03:01:44 truenas kernel: 7330412 pages RAM
Sep 9 03:01:44 truenas kernel: 0 pages HighMem/MovableOnly
Sep 9 03:01:44 truenas kernel: 153007 pages reserved
Sep 9 03:01:44 truenas kernel: 65536 pages cma reserved
Sep 9 03:01:44 truenas kernel: 0 pages hwpoisoned
Sep 9 03:01:44 truenas kernel: Tasks state (memory values in pages):
Sep 9 03:01:44 truenas kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Sep 9 03:01:44 truenas kernel: [ 677] 104 677 2111 352 57344 0 -900 dbus-daemon
Sep 9 03:01:44 truenas kernel: [ 690] 0 690 7816 480 86016 0 -250 systemd-journal
Sep 9 03:01:44 truenas kernel: [ 707] 0 707 6701 640 77824 0 -1000 systemd-udevd
Sep 9 03:01:44 truenas kernel: [ 1864] 0 1864 938010 73387 1110016 0 0 asyncio_loop
Sep 9 03:01:44 truenas kernel: [ 1872] 0 1872 4211 1888 73728 0 0 python3
Sep 9 03:01:44 truenas kernel: [ 1922] 0 1922 188116 14024 290816 0 0 middlewared (ze
Sep 9 03:01:44 truenas kernel: [ 1926] 0 1926 17807 9570 184320 0 0 python3
Sep 9 03:01:44 truenas kernel: [ 4170] 106 4170 1969 160 57344 0 0 rpcbind
Sep 9 03:01:44 truenas kernel: [ 4217] 0 4217 1208 96 49152 0 0 blkmapd
Sep 9 03:01:44 truenas kernel: [ 4239] 0 4239 20060 96 61440 0 0 qemu-ga
Sep 9 03:01:44 truenas kernel: [ 4242] 0 4242 3093 608 61440 0 0 smartd
Sep 9 03:01:44 truenas kernel: [ 4253] 0 4253 198028 1172 430080 0 0 syslog-ng
Sep 9 03:01:44 truenas kernel: [ 4255] 0 4255 11737 160 69632 0 0 gssproxy
Sep 9 03:01:44 truenas kernel: [ 4257] 0 4257 4170 320 73728 0 0 systemd-logind
Sep 9 03:01:44 truenas kernel: [ 4263] 0 4263 18742 800 135168 0 0 winbindd
Sep 9 03:01:44 truenas kernel: [ 4265] 0 4265 43783 352 86016 0 0 zed
Sep 9 03:01:44 truenas kernel: [ 4274] 0 4274 3905 384 69632 0 -1000 sshd
Sep 9 03:01:44 truenas kernel: [ 4287] 131 4287 4715 234 57344 0 0 chronyd
Sep 9 03:01:44 truenas kernel: [ 4330] 131 4330 2686 246 57344 0 0 chronyd
Sep 9 03:01:44 truenas kernel: [ 4331] 0 4331 18759 857 135168 0 0 wb[TRUENAS]
Sep 9 03:01:44 truenas kernel: [ 4332] 0 4332 19971 1408 143360 0 0 smbd
Sep 9 03:01:44 truenas kernel: [ 4377] 0 4377 1005 128 40960 0 0 cron
Sep 9 03:01:44 truenas kernel: [ 4441] 0 4441 109808 14227 229376 0 0 cli
Sep 9 03:01:44 truenas kernel: [ 4477] 0 4477 19379 731 122880 0 0 smbd-notifyd
Sep 9 03:01:44 truenas kernel: [ 4481] 0 4481 19383 731 131072 0 0 smbd-cleanupd
Sep 9 03:01:44 truenas kernel: [ 4538] 0 4538 7048 471 61440 0 0 nginx
Sep 9 03:01:44 truenas kernel: [ 4541] 33 4541 7355 599 73728 0 0 nginx
Sep 9 03:01:44 truenas kernel: [ 4748] 0 4748 19256 798 139264 0 0 wb-idmap
Sep 9 03:01:44 truenas kernel: [ 4766] 999 4766 105927 25886 421888 0 -900 netdata
Sep 9 03:01:44 truenas kernel: [ 4773] 999 4773 11693 320 61440 0 -900 netdata
Sep 9 03:01:44 truenas kernel: [ 6397] 105 6397 1829 192 49152 0 0 avahi-daemon
Sep 9 03:01:44 truenas kernel: [ 6448] 105 6448 1792 131 49152 0 0 avahi-daemon
Sep 9 03:01:44 truenas kernel: [ 6627] 999 6627 17977 6644 147456 0 -900 python.d.plugin
Sep 9 03:01:44 truenas kernel: [ 6705] 1 6705 7825 4000 102400 0 0 wsdd.py
Sep 9 03:01:44 truenas kernel: [ 6749] 0 6749 17405 640 122880 0 0 nmbd
Sep 9 03:01:44 truenas kernel: [ 121924] 0 121924 22186 1630 155648 0 0 smbd[10.0.102.2
Sep 9 03:01:44 truenas kernel: [ 122092] 3000 122092 33747 12725 253952 0 0 smbd[10.0.101.7
Sep 9 03:01:44 truenas kernel: [ 122408] 0 122408 25348 1430 172032 0 0 smbd[10.0.101.5
Sep 9 03:01:44 truenas kernel: [ 122409] 0 122409 23629 1318 159744 0 0 smbd[10.0.101.5
Sep 9 03:01:44 truenas kernel: [ 122490] 3000 122490 34354 13748 266240 0 0 smbd[10.0.102.5
Sep 9 03:01:44 truenas kernel: [ 122518] 0 122518 18555 712 118784 0 0 samba-dcerpcd
Sep 9 03:01:44 truenas kernel: [ 122527] 0 122527 19628 832 143360 0 0 rpcd_lsad
Sep 9 03:01:44 truenas kernel: [ 122529] 0 122529 19525 832 135168 0 0 rpcd_lsad
Sep 9 03:01:44 truenas kernel: [ 277000] 3000 277000 131155 110027 1032192 0 0 smbd[10.0.1.4]
Sep 9 03:01:44 truenas kernel: [ 287922] 3000 287922 26782 5294 192512 0 0 smbd[10.0.2.3]
Sep 9 03:01:44 truenas kernel: [ 292159] 0 292159 35616 11866 262144 0 0 smbd[10.0.1.3]
Sep 9 03:01:44 truenas kernel: [ 296349] 3000 296349 31528 7589 221184 0 0 smbd[10.0.1.2]
Sep 9 03:01:44 truenas kernel: [ 335885] 0 335885 26825 3825 184320 0 0 smbd[10.0.100.2
Sep 9 03:01:44 truenas kernel: [ 420274] 0 420274 28748 160 73728 0 0 nscd
Sep 9 03:01:44 truenas kernel: [ 420415] 999 420415 1113 224 45056 0 -900 bash
Sep 9 03:01:44 truenas kernel: [ 464089] 0 464089 187210 46308 634880 0 0 middlewared (wo
Sep 9 03:01:44 truenas kernel: [ 467472] 0 467472 205742 45840 651264 0 0 middlewared (wo
Sep 9 03:01:44 truenas kernel: [ 474367] 0 474367 187197 45654 638976 0 0 middlewared (wo
Sep 9 03:01:44 truenas kernel: [ 477888] 0 477888 186940 45307 634880 0 0 middlewared (wo
Sep 9 03:01:44 truenas kernel: [ 484953] 0 484953 187196 44733 634880 0 0 middlewared (wo
Sep 9 03:01:44 truenas kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/smbd.service,task=smbd[10.0.1.4],pid=277000,uid=3000
Sep 9 03:01:46 truenas kernel: oom_reaper: reaped process 277000 (smbd[10.0.1.4]), now anon-rss:0kB, file-rss:256kB, shmem-rss:21372kB
Sep 9 03:14:44 truenas qemu-ga[4239]: info: guest-ping called

Update to 24.04.2, which has significant swap/memory changes.

Then try again.

Ok, I ran the update. I’ll run some backups to see if I can trigger the issue.
Thank you

This is known issue with this sort of workflow. Proxmox basically infinitely queues up writes until smbd process goes OOM. You can disable AIO writes for the SMB share (aio write size = 0) or upgrade to EE BETA where mechanism was added to put backpressure on clients.

Thank you,
I updated to 24.04.2 and ran a few backups. So far everything went well, but I don’t have sure fire way to trigger the OOM so I don’t know if that fixed it or not.

If it didn’t I’ll try your two suggestions. I’m not very familiar with AIO writes for the SMB share so I’ll have to do some reading about it. If not I read that EE is scheduled for an early release on September 24 and a stable release on October 29. So I will definitely move to it either at the end of September or when the stable version is out.

I have been seeing the same issue for a while now, ca once a month smb crashes due to OOM on my TrueNas hosted on Proxmox.

I have seen this issue already on 24.04.2, so that didn’t fix it for me.

I have read in some thread that the ratio of CPU cores to ram can affect samba, reducing to 4 cores has made this event less common, but didn’t solve it yet.

As I mentioned it is addressed in 24.10. You can in theory set the parameter I mentioned above to work around in 24.04.2