25.04.02 - "Services" consuming ram causing OOM killer to kill processes

Hello Everyone,

Im currently running 25.04.02 and have noticed the ram usage on my server is slowly being taken over by “Services” until the OOM killer ends up killing the truenas services and im forced to reboot from the cli.

After a reboot, services memory usage goes back to near 0. Right now, I have 4 SMB mounts and a single NFS mount. Right now the “services” on the web gui is showing as using 19GB (while writing this it increased to 21GB than killed truenas)

There are no VM’s, no Containers and no Apps. There are no Data Protection tasks other than scrub and smart tests configured.

Using htop this is what I see (ok as a new user I cant upload images/txt files apparently):

a bunch of middleward using 19.9gb and then a bunch of incusd using 6.5gb

If I look at top i can see that asyncio_loop is listed as using 19.5GB of vmem:

1396 root      20   0   19.5g  16.7g  21668 S   5.9  53.4  10:14.46 asyncio_loop

At one point I did configure the App service to try out rustdesk on it, but I decided against it and deleted the app and unset the pool. This was a few weeks ago but I believe the issue started happening after I did this.

I tried to attach the txt file of the messages log but am not allowed so here is a snippet from when the oom kiiler ran:

Aug  3 01:02:59 truenas kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=containerd.service,mems_allowed=0,global_oom,task_memcg=/system.slice/systemd-machined.service,task=systemd-machine,pid=32326,uid=0
Aug  3 01:02:59 truenas kernel: avahi-daemon invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
Aug  3 01:02:59 truenas kernel: CPU: 4 UID: 105 PID: 4513 Comm: avahi-daemon Tainted: P           OE      6.12.15-production+truenas #1
Aug  3 01:02:59 truenas kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Aug  3 01:02:59 truenas kernel: Hardware name: System manufacturer System Product Name/MAXIMUS VIII HERO, BIOS 3703 12/27/2017
Aug  3 01:02:59 truenas kernel: Call Trace:
Aug  3 01:02:59 truenas kernel:  <TASK>
Aug  3 01:02:59 truenas kernel:  dump_stack_lvl+0x64/0x80
Aug  3 01:02:59 truenas kernel:  dump_header+0x43/0x160
Aug  3 01:02:59 truenas kernel:  oom_kill_process+0xfa/0x200
Aug  3 01:02:59 truenas kernel:  out_of_memory+0x257/0x520
Aug  3 01:02:59 truenas kernel:  __alloc_pages_slowpath.constprop.0+0x696/0xdf0
Aug  3 01:02:59 truenas kernel:  __alloc_pages_noprof+0x30e/0x330
Aug  3 01:02:59 truenas kernel:  alloc_pages_mpol_noprof+0x8f/0x1f0
Aug  3 01:02:59 truenas kernel:  get_free_pages_noprof+0x11/0x40
Aug  3 01:02:59 truenas kernel:  __pollwait+0xa5/0x120
Aug  3 01:02:59 truenas kernel:  pipe_poll+0xa5/0x160
Aug  3 01:02:59 truenas kernel:  do_sys_poll+0x2da/0x600
Aug  3 01:02:59 truenas kernel:  ? __pfx___pollwait+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  ? __pfx_pollwake+0x10/0x10
Aug  3 01:02:59 truenas kernel:  __x64_sys_poll+0xbb/0x140
Aug  3 01:02:59 truenas kernel:  do_syscall_64+0x82/0x190
Aug  3 01:02:59 truenas kernel:  ? audit_reset_context+0x232/0x300
Aug  3 01:02:59 truenas kernel:  ? syscall_exit_to_user_mode_prepare+0x148/0x170
Aug  3 01:02:59 truenas kernel:  ? syscall_exit_to_user_mode+0x10/0x1f0
Aug  3 01:02:59 truenas kernel:  ? do_syscall_64+0x8e/0x190
Aug  3 01:02:59 truenas kernel:  ? pipe_write+0x407/0x640
Aug  3 01:02:59 truenas kernel:  ? audit_filter_rules.constprop.0+0x140/0x1120
Aug  3 01:02:59 truenas kernel:  ? __audit_filter_op+0xaf/0x110
Aug  3 01:02:59 truenas kernel:  ? audit_reset_context+0x232/0x300
Aug  3 01:02:59 truenas kernel:  ? syscall_exit_to_user_mode_prepare+0x148/0x170
Aug  3 01:02:59 truenas kernel:  ? syscall_exit_to_user_mode+0x10/0x1f0
Aug  3 01:02:59 truenas kernel:  ? do_syscall_64+0x8e/0x190
Aug  3 01:02:59 truenas kernel:  ? syscall_exit_to_user_mode+0x10/0x1f0
Aug  3 01:02:59 truenas kernel:  ? do_syscall_64+0x8e/0x190
Aug  3 01:02:59 truenas kernel:  ? __audit_filter_op+0xaf/0x110
Aug  3 01:02:59 truenas kernel:  ? audit_reset_context+0x232/0x300
Aug  3 01:02:59 truenas kernel:  ? syscall_exit_to_user_mode_prepare+0x148/0x170
Aug  3 01:02:59 truenas kernel:  ? syscall_exit_to_user_mode+0x10/0x1f0
Aug  3 01:02:59 truenas kernel:  ? do_syscall_64+0x8e/0x190
Aug  3 01:02:59 truenas kernel:  ? exc_page_fault+0x76/0x190
Aug  3 01:02:59 truenas kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug  3 01:02:59 truenas kernel: RIP: 0033:0x7f1f7223c1a0
Aug  3 01:02:59 truenas kernel: Code: Unable to access opcode bytes at 0x7f1f7223c176.
Aug  3 01:02:59 truenas kernel: RSP: 002b:00007ffdbf9339c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000007
Aug  3 01:02:59 truenas kernel: RAX: ffffffffffffffda RBX: 000055b771a46b80 RCX: 00007f1f7223c1a0
Aug  3 01:02:59 truenas kernel: RDX: 0000000000000100 RSI: 000000000000000a RDI: 000055b771a62920
Aug  3 01:02:59 truenas kernel: RBP: 00007f1f71de9740 R08: 0000000000000000 R09: 000055b771a97f00
Aug  3 01:02:59 truenas kernel: R10: 00007ffdbf933990 R11: 0000000000000202 R12: 000055b771a4a960
Aug  3 01:02:59 truenas kernel: R13: 0000000000000000 R14: 000055b771a45100 R15: 00007f1f723198c0
Aug  3 01:02:59 truenas kernel:  </TASK>
Aug  3 01:02:59 truenas kernel: Mem-Info:
Aug  3 01:02:59 truenas kernel: active_anon:6675 inactive_anon:3791376 isolated_anon:0
 active_file:15 inactive_file:86 isolated_file:0
 unevictable:31 dirty:21 writeback:95
 slab_reclaimable:17682 slab_unreclaimable:72510
 mapped:190 shmem:9169 pagetables:8591
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:27552 free_pcp:125 free_cma:1660
Aug  3 01:02:59 truenas kernel: Node 0 active_anon:26700kB inactive_anon:15165504kB active_file:356kB inactive_file:8kB unevictable:124kB isolated(anon):0kB isolated(file):0kB mapped:728kB dirty:84kB writeback:380kB shmem:36676kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:1310720kB writeback_tmp:0kB kernel_stack:9008kB pagetables:34364kB sec_pagetables:0kB all_unreclaimable? no
Aug  3 01:02:59 truenas kernel: Node 0 DMA free:116kB boost:2048kB min:2112kB low:2128kB high:2144kB reserved_highatomic:0KB active_anon:0kB inactive_anon:13192kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug  3 01:02:59 truenas kernel: lowmem_reserve[]: 0 2028 15765 0 0
Aug  3 01:02:59 truenas kernel: Node 0 DMA32 free:8668kB boost:0kB min:8684kB low:10852kB high:13020kB reserved_highatomic:0KB active_anon:0kB inactive_anon:2131692kB active_file:124kB inactive_file:0kB unevictable:0kB writepending:48kB present:2220060kB managed:2153548kB mlocked:0kB bounce:0kB free_pcp:500kB local_pcp:0kB free_cma:0kB
Aug  3 01:02:59 truenas kernel: lowmem_reserve[]: 0 0 13736 0 0
Aug  3 01:02:59 truenas kernel: Node 0 Normal free:101592kB boost:67584kB min:126412kB low:141116kB high:155820kB reserved_highatomic:0KB active_anon:26700kB inactive_anon:13020620kB active_file:316kB inactive_file:100kB unevictable:124kB writepending:416kB present:14401536kB managed:14073524kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:6640kB
Aug  3 01:02:59 truenas kernel: lowmem_reserve[]: 0 0 0 0 0
Aug  3 01:02:59 truenas kernel: Node 0 DMA: 9*4kB (U) 4*8kB (U) 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB 0*65536kB = 116kB
Aug  3 01:02:59 truenas kernel: Node 0 DMA32: 0*4kB 1*8kB (U) 7*16kB (UM) 17*32kB (UM) 8*64kB (UM) 5*128kB (UM) 6*256kB (UM) 2*512kB (M) 0*1024kB 2*2048kB (M) 0*4096kB 0*8192kB 0*16384kB 0*32768kB 0*65536kB = 8472kB
Aug  3 01:02:59 truenas kernel: Node 0 Normal: 5934*4kB (UMEC) 2388*8kB (UMEC) 826*16kB (UMEC) 737*32kB (UMEC) 223*64kB (UME) 18*128kB (UMEC) 3*256kB (EC) 1*512kB (C) 2*1024kB (C) 1*2048kB (C) 0*4096kB 0*8192kB 0*16384kB 0*32768kB 0*65536kB = 101592kB
Aug  3 01:02:59 truenas kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Aug  3 01:02:59 truenas kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug  3 01:02:59 truenas kernel: 9306 total pagecache pages
Aug  3 01:02:59 truenas kernel: 0 pages in swap cache
Aug  3 01:02:59 truenas kernel: Free swap  = 0kB
Aug  3 01:02:59 truenas kernel: Total swap = 0kB
Aug  3 01:02:59 truenas kernel: 4159396 pages RAM
Aug  3 01:02:59 truenas kernel: 0 pages HighMem/MovableOnly
Aug  3 01:02:59 truenas kernel: 98788 pages reserved
Aug  3 01:02:59 truenas kernel: 65536 pages cma reserved
Aug  3 01:02:59 truenas kernel: 0 pages hwpoisoned
Aug  3 01:02:59 truenas kernel: Tasks state (memory values in pages):
Aug  3 01:02:59 truenas kernel: [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
Aug  3 01:02:59 truenas kernel: [    834]   104   834     2035      293      192      101         0    53248        0          -900 dbus-daemon
Aug  3 01:02:59 truenas kernel: [    846]     0   846    11540      374      224      150         0   114688        0          -250 systemd-journal
Aug  3 01:02:59 truenas kernel: [    864]     0   864     6898      637      576       61         0    73728        0         -1000 systemd-udevd
Aug  3 01:02:59 truenas kernel: [   1410]     0  1410  4462944  3656219  3656101       54        64 29954048        0         -1000 asyncio_loop
Aug  3 01:02:59 truenas kernel: [   1427]     0  1427     4194     1696     1568      128         0    69632        0         -1000 python3
Aug  3 01:02:59 truenas kernel: [   1476]     0  1476     9208     4283     4186       97         0   106496        0         -1000 python3
Aug  3 01:02:59 truenas kernel: [   2859]     0  2859     1435      270      192       78         0    49152        0         -1000 dhclient
Aug  3 01:02:59 truenas kernel: [   2866]     0  2866     1435      249      192       57         0    53248        0         -1000 dhclient
Aug  3 01:02:59 truenas kernel: [   3043]     0  3043      711      132       65       67         0    49152        0             0 rpc.idmapd
Aug  3 01:02:59 truenas kernel: [   3044]     0  3044     1302      189       97       92         0    49152        0             0 nfsdcld
Aug  3 01:02:59 truenas kernel: [   3045]     0  3045    21714      218       98      120         0    57344        0         -1000 auditd
Aug  3 01:02:59 truenas kernel: [   3118]   106  3118     1970      207       96      111         0    53248        0             0 rpcbind
Aug  3 01:02:59 truenas kernel: [   3119]   107  3119     1134      143       73       70         0    49152        0             0 rpc.statd
Aug  3 01:02:59 truenas kernel: [   3120]     0  3120     1258      158      100       58         0    45056        0             0 rpc.mountd
Aug  3 01:02:59 truenas kernel: [   3121]     0  3121     1258      163      100       63         0    45056        0             0 rpc.mountd
Aug  3 01:02:59 truenas kernel: [   3122]     0  3122     1258      163      100       63         0    45056        0             0 rpc.mountd
Aug  3 01:02:59 truenas kernel: [   3134]     0  3134    28122      162       96       66         0    77824        0             0 gssproxy
Aug  3 01:02:59 truenas kernel: [   3150]     0  3150    38187      107       64       43         0    61440        0         -1000 lxcfs
Aug  3 01:02:59 truenas kernel: [   3187]   131  3187     4715      172      138       34         0    69632        0             0 chronyd
Aug  3 01:02:59 truenas kernel: [   3196]   131  3196     2686      195      116       79         0    65536        0             0 chronyd
Aug  3 01:02:59 truenas kernel: [   3206]     0  3206   542608     4307     4247       60         0   339968        0          -999 containerd
Aug  3 01:02:59 truenas kernel: [   3261]     0  3261     1005       86        0       86         0    49152        0             0 cron
Aug  3 01:02:59 truenas kernel: [   4285]   999  4285   124696    39282    39250        0        32   544768        0          -900 netdata
Aug  3 01:02:59 truenas kernel: [   4444]   999  4444    11686      275      192       83         0    73728        0          -900 netdata
Aug  3 01:02:59 truenas kernel: [   4513]   105  4513     1820      149       64       85         0    61440        0             0 avahi-daemon
Aug  3 01:02:59 truenas kernel: [   4574]   105  4574     1722       95       73       22         0    61440        0             0 avahi-daemon
Aug  3 01:02:59 truenas kernel: [   6804] 65534  6804     2906      226      128       98         0    65536        0             0 dnsmasq
Aug  3 01:02:59 truenas kernel: [   7023]     0  7023   589509     6102     6102        0         0   446464        0          -500 dockerd
Aug  3 01:02:59 truenas kernel: [   7227]   999  7227    22237     4735     4735        0         0   139264        0          -900 python.d.plugin
Aug  3 01:02:59 truenas kernel: [   7816]   950  7816     1460      201      160       41         0    45056        0         -1000 zsh
Aug  3 01:02:59 truenas kernel: [   7834]     0  7834    19581      168       16       75        77   143360        0             0 wb-idmap
Aug  3 01:02:59 truenas kernel: [  50510]     0 50510   207154    12974    12888       86         0   307200        0         -1000 middlewared (wo
Aug  3 01:02:59 truenas kernel: [  50653]     0 50653   209877    15673    15573      100         0   323584        0         -1000 middlewared (wo
Aug  3 01:02:59 truenas kernel: [  50658]     0 50658   207091    12926    12891       35         0   311296        0         -1000 middlewared (wo
Aug  3 01:02:59 truenas kernel: [  51659]     0 51659   206867    12526    12481       45         0   307200        0         -1000 middlewared (wo
Aug  3 01:02:59 truenas kernel: [  51764]     0 51764   207170    12964    12889       75         0   303104        0         -1000 middlewared (wo
Aug  3 01:02:59 truenas kernel: [  53427]   999 53427     1113      213      160       53         0    49152        0          -900 bash
Aug  3 01:02:59 truenas kernel: [  53597]     0 53597    12294       86       32       54         0    77824        0             0 nscd
Aug  3 01:02:59 truenas kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=avahi-daemon.service,mems_allowed=0,global_oom,task_memcg=/system.slice/incus.service,task=dnsmasq,pid=6804,uid=65534
Aug  3 01:02:59 truenas systemd-journald[846]: /dev/kmsg buffer overrun, some messages lost.
Aug  3 01:02:59 truenas systemd-journald[846]: Data hash table of /var/log/journal/b867530c94a744a28c56138b28f44c71/system.journal has a fill level at 75.0 (8533 of 11377 items, 6553600 file size, 768 bytes per hash table item), suggesting rotation.
Aug  3 01:02:59 truenas systemd-journald[846]: /var/log/journal/b867530c94a744a28c56138b28f44c71/system.journal: Journal header limits reached or header out-of-date, rotating.

Anyone have any ideas?

Please do run some drive tests and include the boot pool as well.

Ran long test on boot pool, results:

root@truenas[/home/truenas_admin]# smartctl --all /dev/sda   
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 Series
Serial Number:    S14GNEBCB68736D
LU WWN Device Id: 5 002538 5500b7272
Firmware Version: DXT0AB0Q
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5804
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Aug  6 10:16:37 2025 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (53956) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  40) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       82047
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       1319
177 Wear_Leveling_Count     0x0013   040   040   000    Pre-fail  Always       -       721
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   071   051   000    Old_age   Always       -       29
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       97
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       225839346522

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     16511         -
# 2  Short offline       Completed without error       00%     16147         -
# 3  Extended offline    Completed without error       00%     16125         -
# 4  Short offline       Completed without error       00%     15979         -
# 5  Short offline       Completed without error       00%     15816         -
# 6  Extended offline    Completed without error       00%     15789         -
# 7  Short offline       Completed without error       00%     15643         -
# 8  Short offline       Completed without error       00%     15475         -
# 9  Extended offline    Completed without error       00%     15405         -
#10  Short offline       Completed without error       00%     15307         -
#11  Short offline       Completed without error       00%     15139         -
#12  Extended offline    Completed without error       00%     15069         -
#13  Short offline       Completed without error       00%     14971         -
#14  Short offline       Completed without error       00%     14664         -
#15  Short offline       Completed without error       00%     14496         -
#16  Extended offline    Completed without error       00%     14354         -
#17  Short offline       Completed without error       00%     14328         -
#18  Short offline       Completed without error       00%     14160         -
#19  Short offline       Completed without error       00%     13992         -
#20  Extended offline    Completed without error       00%     13970         -
#21  Short offline       Completed without error       00%     13824         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  255        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

root@truenas[/home/truenas_admin]# smartctl --all /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 EVO 250GB
Serial Number:    S1DBNSAF554038N
LU WWN Device Id: 5 002538 8a0491b02
Firmware Version: EXT0DB6Q
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database 7.3/5804
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Aug  6 10:18:53 2025 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 247) Self-test routine in progress...
                                        70% of test remaining.
Total time to complete Offline 
data collection:                ( 4800) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  80) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       8586
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       1883
177 Wear_Leveling_Count     0x0013   095   095   000    Pre-fail  Always       -       57
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   065   055   000    Old_age   Always       -       35
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       1379
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       14747127528

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         0         -
# 2  Short offline       Completed without error       00%        37         -
# 3  Short offline       Completed without error       00%      2628         -
# 4  Short offline       Completed without error       00%      2460         -
# 5  Extended offline    Completed without error       00%      2438         -
# 6  Short offline       Completed without error       00%      1099         -
# 7  Short offline       Completed without error       00%       936         -
# 8  Extended offline    Completed without error       00%       909         -
# 9  Short offline       Completed without error       00%       763         -
#10  Short offline       Completed without error       00%       595         -
#11  Extended offline    Completed without error       00%       525         -
#12  Short offline       Completed without error       00%       427         -
#13  Short offline       Completed without error       00%       259         -
#14  Extended offline    Completed without error       00%       189         -
#15  Short offline       Completed without error       00%        91         -
#16  Short offline       Completed without error       00%         0         -
#17  Extended offline    Completed without error       00%      1412         -
#18  Short offline       Completed without error       00%      1385         -
#19  Short offline       Completed without error       00%      1217         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

Running long tests on all of the disks but they havent finished. Looking at all of the previous run tests they show success (short and long).

Is there anything else I should run?

I would add scrubbing the boot pool to the list.

[sudo] zpool scrub boot-pool

However, I’d wait until the SMART tests are finished. Is the machine rack-mounted? I’m asking because my cats cause me to regularly check the cables at home.

Please add some

[sudo] zpool list

You wrote, that you are seeing some incusd processes but have actually no containers running. So I’d double check to unset any pool or other kind of activation on the container tab.

So had some problems with the nas - pulled one of the boot drives and it at least started up and stayed up. Bought a new ssd to replace it, which should be here today

scrubbed the boot-pool:

  pool: boot-pool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 00:00:37 with 0 errors on Thu Aug  7 06:26:43 2025
config:

        NAME                      STATE     READ WRITE CKSUM
        boot-pool                 DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            sda3                  ONLINE       0     0     0
            10973333999425691178  UNAVAIL      0     0     0  was /dev/sdb3

errors: No known data errors

zpool list:

root@truenas[/home/truenas_admin]# zpool list                                                                          
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
POOL       87.3T  53.2T  34.1T        -         -     1%    60%  1.00x    ONLINE  /mnt
boot-pool   232G  16.9G   215G        -         -    13%     7%  1.00x  DEGRADED  -

Went through the containers - I did find the service was enabled so disabled that. The “services” memory usage is showing as 15.1GB so hopefully it stays there

So system was still eating ram and crashing.

On the new ssd I have re-installed 25.04.02 and imported my pool so none of the previous config should be there. Ill try this and see what happens

1 Like