Unscheduled System Reboots/Core Files Found - During File Transfer

When moving data, my build is rebooting. I have searched and found it could be the old PCIe SATA card I had, so I went and purchased a LSI HBA and rebuilt my pool, and it’s still doing it. I have a fan blowing on the HBA knowing that it get’s scaldingly hot. I believe my PSU is fine as I don’t believe this hardware is getting over 550W. The core files found are: smbd.core, python3.9.core. The log files aren’t showing much of anything around the times of reboot. Though I do see something regarding the HBA card:

Apr 10 19:28:42 truenas Checking SAS92xx HBAs firmware
Apr 10 19:28:42 truenas Unable to find firmware file with prefix 'mps_SAS9207-8i'
Apr 10 19:28:42 truenas Checking SAS93xx HBAs firmware
Apr 10 19:28:42 truenas Checking HBA94xx HBAs firmware
Apr 10 19:28:42 truenas HBA firmware check complete
Apr 10 19:28:42 truenas Starting zfsd.

Here is the console.log of the first reboot. Not sure if there’s another log to check:

Apr 10 19:28:44 truenas
Apr 10 19:28:44 truenas Wed Apr 10 19:28:44 PDT 2024
Apr 10 20:04:05 truenas 1 2024-04-10T20:04:05.195391-07:00 truenas.local middlewared 413 - - Fingerprint of the certificate used in UI : CE:D8:50:61:D5:11:E7:67:D6:A5:39:1A:12:DB:56:6B:03:CE:D9:E9
Apr 11 00:27:02 Home-NAS Starting devd.
Apr 11 00:27:02 Home-NAS Autoloading module: intpm
Apr 11 00:27:02 Home-NAS Starting ums0 moused.
Apr 11 00:27:02 Home-NAS Starting zfsd.
Apr 11 00:27:02 Home-NAS Setting hostuuid: 0a006b9c-7148-0000-0000-000000000000.
Apr 11 00:27:02 Home-NAS Setting hostid: 0x7277cc68.
Apr 11 00:27:02 Home-NAS <118>middlewared: starting

Any and all help is appreciated. Many thanks.

Output of sas2flash -listall please.
Also, I would run https://www.memtest.org/.

LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2308_1(D1)

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS2308_1(D1)   20.00.06.00    14.01.00.06    07.39.02.00     00:06:00:00

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.

The issue looks the reason the following firmware was published LSI 9300-xx Firmware Update | TrueNAS Community.
Honestly I would try to reflash your current firmware.
Just to be sure, output of sas2flash -list?

1 Like

Memtest. You’re on to something. Failed about 15 min to test. En route to buy new memory and retest.

1 Like

It might be only one of the two sticks you have, try one after the other.
As a side note, please put yout signature inside [details="Summary"] This text will be hidden [/details].

Reduce the number of dimms. Retest.

Remove them all. Attempt to blow out any dust with canned/canless air. Reinstall, retest.

Never actually had memory fail, but have had memory errors.

ECC is a good thing :wink:

1 Like

There’s an endless supply of it around here. eBay sellers love to bulk up their wares with dodgy DRAM.

2 Likes

Failed memory was (admittedly) cheap memory, so replacing it was a no-brainer.

Replaced with new memory and here’s the result.

@Davvo, here is the information you requested:

LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2308_1(D1)

        Controller Number              : 0
        Controller                     : SAS2308_1(D1)
        PCI Address                    : 00:06:00:00
        SAS Address                    : 56c92bf-0-000a-97ab
        NVDATA Version (Default)       : 14.01.00.06
        NVDATA Version (Persistent)    : 14.01.00.06
        Firmware Product ID            : 0x2214 (IT)
        Firmware Version               : 20.00.06.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9207-8i
        BIOS Version                   : 07.39.02.00
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : SAS9207-8i
        Board Assembly                 : N/A
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.
1 Like

As far as I can tell, there is nothing wrong in the HBA, I would also suppose that camcontrol devlist show you everything.
If it happens again, try to reflash the BIOS.

We begin mass data transfer now and see if it crashes again. I’ll report back regardless. Thanks everyone for the help!

Following up, running solid with no issues since. Marked bad memory as solution. Cheers.

2 Likes

Version: TrueNAS-13.0-U6.1
Motherboard: ASRock B450M PRO4 R2.0 AM4 AMD Promontory B450
PSU: Rosewill PMG 550, 80+ Gold Certified, 550W
CPU: AMD Ryzen 5 2600X Six-Core Processor
RAM: CORSAIR - VENGEANCE LPX 16GB (2 x 8GB) 3600MHz CMK16GX4M2D3600C18
Boot: SanDisk SD8TN8U256G1001
HBA: 9207-8i HBA LSI
pool0:

  • da0: ATA ST8000VN004-3CP1
  • da1: ATA ST8000VN004-3CP1
  • da2: ATA ST8000VN004-3CP1
  • da3: ATA ST8000VN004-2M21
  • da4: ATA ST8000VN004-2M21
  • da5: ATA ST8000VN004-2M21
  • da6: ATA ST8000VN004-2M21
  • da7: ATA ST8000VN004-3CP1

might be worth wrapping that in a details tag

Thanks for that, I’d been trying to figure out what tag that was.