Truenas Scale 24.04-RC.1 - home lab died (took about 20 minutes but IP is accessible again)

while I was posting this the system got past “65” and is now responsive

I was testing Truenas Scale 24.04-RC.1 at home. It appears to have died.

Symptoms:

  • Earlier today the GUI stopped working.
  • No longer able to login on the GUI/browser
    SSH did not work either but I had not tried SSH before
  • IP was still pingable
  • Mounted shares still worked
    … went about my day job
  • Tried to access the API with Powershell → errors. Unable to connect.

I looked at the screen → Out of Memory errors (screenshot)
rebooted with ctrl+alt+delete
journal error (screenshot)
rebooted again by pulling the power cable → hangs at “65” (screenshot)
the video card is on the motherboard. no other video card

I’m assuming this is a reinstall. Right?
Thanks

Hardware
Dell Optiplex 7040 minitower
16GB RAM
NVME boot
4*500GB SSD RAIDZ1

Screenshots

1 Like

Are you running any apps on the TrueNAS? It would also be helpful to know what you were doing with the system before the issue occurred.

I was doing file creation and copies
I already had several directories and files uploaded (ISOs and smaller files)
I had created 500k files with the script below and was copying “dir1” to “dir1-copy” etc
While copying files, I was generating more files.
The day before I had adjusted SYNC on the share. SYNC inherited / SYNC disabled. I did not see a performance difference.

Script to make bulk files (I modified what I found, but I do not recall the original link)


$KBSizes = @(1,2,4,8,16,32,64,128,256,512)
$Qty = 50000
$OutDir = "C:\temp\" #tested writing to Truenas mapped drive


#create and start a stopwatch object to measure how long it all takes.
$stopwatch = [Diagnostics.Stopwatch]::StartNew()


Foreach ($KB in $KBSizes){
    #create a fixed size byte array for later use.  make it the required file size.
    $bytearray = New-Object byte[] ($KB * 1024)
    
    #make directory
    $SizeOutDir = $OutDir + "Qty" + $Qty.ToString() + "_" + $KB.tostring() + "KB_files" + "\"
    [System.IO.Directory]::CreateDirectory($SizeOutDir) | Out-Null
    
    #create a CSRNG object
    $RNGObject = New-Object Security.Cryptography.RNGCryptoServiceProvider

    for ($n = 0; $n -le $Qty; $n++){

        # create a file stream handle with a name format 'filennnnn'
        $TempFullFilename = $SizeOutDir + "file$("{0:D5}" -f $n)"
        $stream = New-Object System.IO.FileStream($TempFullFilename), Create

        # and a stream writer handle
        $writer = New-Object System.IO.BinaryWriter($stream)

        # Fill our array from the CSRNG
        $RNGObject.GetNonZeroBytes($bytearray)

        # Append to the current file
        $writer.write($bytearray)

        # Close the stream
        $stream.close()

    }

}

# how long did it all take?
$stopwatch.stop()
$stopwatch
#

I thought the system recovered since the GUI was available
The SMB share is not working.
Rebooted TrueNas
Rebooted Windows 10 non-domain joined client

appended
Embarrassed → SMB was not running. Started SMB and share is accessible
note: SMB is set to start automatically

Share was not accessible
The disks are not assigned / vdev offline
I’m going to reinstall

Can you send me a debug in a PM? In my signature you’ll find some information on how to do it.

1 Like

Few things

You ran out of RAM, and don’t have SWAP disks, that’s why it crashed on you:

Apr 17 01:03:44 home-tn01 kernel: Out of memory: Killed process 909 (asyncio_loop) total-vm:38575996kB, anon-rss:20369796kB, file-rss:96kB, shmem-rss:56kB, UID:0 pgtables:71536kB oom_score_adj:0
Apr 17 01:54:15 home-tn01 kernel: Out of memory: Killed process 3015324 (asyncio_loop) total-vm:38879780kB, anon-rss:20681416kB, file-rss:84kB, shmem-rss:0kB, UID:0 pgtables:72524kB oom_score_adj:0
Apr 17 02:43:54 home-tn01 kernel: Out of memory: Killed process 3096614 (asyncio_loop) total-vm:38544244kB, anon-rss:20565588kB, file-rss:248kB, shmem-rss:0kB, UID:0 pgtables:71864kB oom_score_adj:0
Apr 17 03:31:55 home-tn01 kernel: Out of memory: Killed process 3176531 (asyncio_loop) total-vm:37916380kB, anon-rss:19889600kB, file-rss:212kB, shmem-rss:0kB, UID:0 pgtables:70500kB oom_score_adj:0
Apr 17 04:22:03 home-tn01 kernel: Out of memory: Killed process 3253952 (asyncio_loop) total-vm:38947780kB, anon-rss:20778308kB, file-rss:496kB, shmem-rss:0kB, UID:0 pgtables:72668kB oom_score_adj:0
Apr 17 05:10:50 home-tn01 kernel: Out of memory: Killed process 3334721 (asyncio_loop) total-vm:38175496kB, anon-rss:20217820kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:71156kB oom_score_adj:0
Apr 17 06:00:01 home-tn01 kernel: Out of memory: Killed process 3413286 (asyncio_loop) total-vm:38572768kB, anon-rss:20427748kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:71916kB oom_score_adj:0
{
    "swap_disks": [],
    "error": null
}

When it crashed we can see:

Apr 17 01:03:44 home-tn01 kernel: Tasks state (memory values in pages):
Apr 17 01:03:44 home-tn01 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Apr 17 01:03:44 home-tn01 kernel: [    550]   104   550     2179       32    53248      256          -900 dbus-daemon
Apr 17 01:03:44 home-tn01 kernel: [    563]     0   563    15117       64   102400      320          -250 systemd-journal
Apr 17 01:03:44 home-tn01 kernel: [    579]     0   579     6639       96    81920      576         -1000 systemd-udevd
Apr 17 01:03:44 home-tn01 kernel: [    909]     0   909  9643999  5092487 73252864  3876860             0 asyncio_loop
Apr 17 01:03:44 home-tn01 kernel: [    913]     0   913     4211      128    69632     1664             0 python3
Apr 17 01:03:44 home-tn01 kernel: [    975]     0   975    17810      258   176128     9248             0 python3
Apr 17 01:03:44 home-tn01 kernel: [   2193]     0  2193     1468       64    49152      128             0 dhclient
Apr 17 01:03:44 home-tn01 kernel: [   2375]   106  2375     1969      128    53248       64             0 rpcbind
Apr 17 01:03:44 home-tn01 kernel: [   2412]     0  2412     1208       64    49152       32             0 blkmapd
Apr 17 01:03:44 home-tn01 kernel: [   2430]     0  2430    11737       32    65536      128             0 gssproxy
Apr 17 01:03:44 home-tn01 kernel: [   2433]     0  2433     3020       32    61440      448             0 smartd
Apr 17 01:03:44 home-tn01 kernel: [   2442]     0  2442   161110      443   380928     5216             0 syslog-ng
Apr 17 01:03:44 home-tn01 kernel: [   2451]     0  2451     8312       64    86016      256             0 systemd-logind
Apr 17 01:03:44 home-tn01 kernel: [   2452]     0  2452    43782       96    90112      288             0 zed
Apr 17 01:03:44 home-tn01 kernel: [   2459]   131  2459     4715       42    57344      128             0 chronyd
Apr 17 01:03:44 home-tn01 kernel: [   2478]   131  2478     2686       86    57344       96             0 chronyd
Apr 17 01:03:44 home-tn01 kernel: [   2558]     0  2558   144982     3971   245760    10208             0 cli
Apr 17 01:03:44 home-tn01 kernel: [   2633]     0  2633     7048       92    53248      320             0 nginx
Apr 17 01:03:44 home-tn01 kernel: [   2635]    33  2635     7583       64    69632      800             0 nginx
Apr 17 01:03:44 home-tn01 kernel: [   2637]     0  2637     1005       96    45056       32             0 cron
Apr 17 01:03:44 home-tn01 kernel: [   2641]   999  2641    93512    13997   327680     2112          -900 netdata
Apr 17 01:03:44 home-tn01 kernel: [   2650]   999  2650    11693       96    73728      192          -900 netdata
Apr 17 01:03:44 home-tn01 kernel: [   3534]   999  3534    18095     1948   147456     4704          -900 python.d.plugin
Apr 17 01:03:44 home-tn01 kernel: [   5471]     0  5471    18740       96   126976      576             0 winbindd
Apr 17 01:03:44 home-tn01 kernel: [   5500]     0  5500    18754      183   126976      608             0 wb[TRUENAS]
Apr 17 01:03:44 home-tn01 kernel: [   5514]     0  5514    19288      216   131072      544             0 wb-idmap
Apr 17 01:03:44 home-tn01 kernel: [   7566]     0  7566    18752      153   126976      608             0 wb[BUILTIN]
Apr 17 01:03:44 home-tn01 kernel: [  49312]   105 49312     1862       96    57344       96             0 avahi-daemon
Apr 17 01:03:44 home-tn01 kernel: [  49324]   105 49324     1792       69    57344       64             0 avahi-daemon
Apr 17 01:03:44 home-tn01 kernel: [  49383]     1 49383     7825      128    98304     3776             0 wsdd.py
Apr 17 01:03:44 home-tn01 kernel: [  49385]     0 49385    17402      160   118784      480             0 nmbd
Apr 17 01:03:44 home-tn01 kernel: [1294672]     0 1294672     4154       64    73728      224             0 systemd-machine
Apr 17 01:03:44 home-tn01 kernel: [2042945]     0 2042945    54434     4862   196608     8896             0 cli
Apr 17 01:03:44 home-tn01 kernel: [  84902]     0 84902    19944      160   135168      672             0 smbd
Apr 17 01:03:44 home-tn01 kernel: [  84904]     0 84904    19376      144   118784      544             0 smbd-notifyd
Apr 17 01:03:44 home-tn01 kernel: [  84905]     0 84905    19380      144   126976      544             0 smbd-cleanupd
Apr 17 01:03:44 home-tn01 kernel: [ 124322]     0 124322    32086      215   217088     4000             0 smbd[10.0.6.207
Apr 17 01:03:44 home-tn01 kernel: [2311723]     0 2311723   464722     1497   425984    12992             0 middlewared (ze
Apr 17 01:03:44 home-tn01 kernel: [2470848]     0 2470848    26908      353   188416      672             0 smbd[10.0.7.237
Apr 17 01:03:44 home-tn01 kernel: [2678999]   999 2678999     1113      128    53248       64          -900 bash
Apr 17 01:03:44 home-tn01 kernel: [2962980]     0 2962980    45132       64    73728       32             0 nscd
Apr 17 01:03:44 home-tn01 kernel: [2993540]     0 2993540   168846      197   622592    45344             0 middlewared (wo
Apr 17 01:03:44 home-tn01 kernel: [3005495]     0 3005495   168874      192   630784    45376             0 middlewared (wo
Apr 17 01:03:44 home-tn01 kernel: [3006986]     0 3006986   168874      204   630784    45344             0 middlewared (wo
Apr 17 01:03:44 home-tn01 kernel: [3008579]     0 3008579   168844      227   626688    44800             0 middlewared (wo
Apr 17 01:03:44 home-tn01 kernel: [3010195]     0 3010195   150149      184   606208    44576             0 middlewared (wo
Apr 17 01:03:44 home-tn01 kernel: [3015200]   999 3015200     1113      117    53248       52          -900 bash

It looks like the asyncio_loop thread needed 37GiB of RAM when it crashed, and you only have 16GiB.

Memory usage (bytes) = 9643999 pages * 4096 bytes/page = 39503962104 bytes
Memory usage (GiB) = 39503962104 bytes / (2^30 bytes/GiB) ≈ 36.77 GiB

I’m not sure why that happened, and you should probably file a bug report in Jira.
Issue Reporting in Jira | TrueNAS Documentation Hub

1 Like

Swap space Is this still accurate ?

Swap space is allocated when drives are partitioned before being added to a vdev. A 2 GiB partition for swap space is created on each data drive by default. The size of space to allocate can be changed in System ➞ Advanced in the Swap size in Gib field.

I think I allowed 16GB swap space during install.

I’m going to re-install since the box is stuck here
image

I think this could be a serious bug in 24 dragonfish. I am running released version, freshly installed no app just pure file transfer via smb. UI beocme frozen, asyncio_loop starts to consume swap for 1.6GB/8GB swap even when I have 1TB RAM, 900GB in ARC, 15GB service, and 65GB still free… if i stop transfer, asyncio_loop swap usage goes down, UI become responsive again.

I can also confirm this issue. Pretty vanilla setup with Time Machine shares. The performance hit is significant…I’ve had to reboot my TrueNAS X several times.

Disable lru_gen.

1 Like

Appreciated. Just glad to know there’s a straightforward solve. Looks like it’s in the .1 release by default, so I’ll just upgrade and try that route first.

Thanks so much for a timely answer and pointer @Stux.