TrueNAS Core: Can't execute SMART check "Not capable of smart self check" resulting in BUG

I finally stumbled across the new forum too, hi guys.
(Nice to see the design is the same as Lawrence Systems, kudos!)

So I have a virtualization server running that is my main home lab and NAS, and in the last few days I started backing up data to my TrueNAS Core VM via SFTP and my Nextcloud WebDav via an NFS share when I noticed that both transfers were failing.
Upon investigation, the TrueNAS VM showed me two failed SMART attempts for the da5 and da7 drives.


What surprised me was that the system complained that it could not read or execute the Smart Self check on two of the 10 IronWolfs.

I then wanted to start troubleshooting and started first a short and then a long SMART test on a few drives when the UI started acting up. I tried to run a SMART check on da4 and it loaded for minutes when I finally saw the message that it had started the task. The button for da5 was not responding at all. After that I tried it over selecting a series of disks and also tried scheduling, without success.

After that the GUI partially froze and I could not return to the dashboard, while a few entries like disks still worked normally. A day later I logged onto the system and was greeted with a still buggy UI that greeted me with a loading pool screen.

EDIT: the Dashboard does seem to work again and I didn’t want to restart TrueNAS on purpose to see how it would behave if I didn’t immediately noticed the fault. Pools page still doesn’t load though !

The following errors were reported since the last log on:

Failed to check for alert Quota: concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 111, in main_worker res = MIDDLEWARE._run(*call_args) File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run return self._call(name, serviceobj, methodobj, args, job=job) File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call return methodobj(*params) File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call return methodobj(*params) File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 483, in query_for_quota_alert config = self.middleware.call_sync("systemdataset.config") File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 78, in call_sync return self.client.call(method, *params, timeout=timeout, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 456, in call raise c.py_exception RuntimeError: can't start new thread """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/alert.py", line 740, in __run_source alerts = (await alert_source.check()) or [] File "/usr/local/lib/python3.9/site-packages/middlewared/alert/base.py", line 212, in check return await self.middleware.run_in_thread(self.check_sync) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1159, in run_in_thread return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) File "/usr/local/lib/python3.9/site-packages/middlewared/utils/io_thread_pool_executor.py", line 43, in worker fut.set_result(fn(*args, **kwargs)) File "/usr/local/lib/python3.9/site-packages/middlewared/alert/source/quota.py", line 38, in check_sync datasets = self.middleware.call_sync("zfs.dataset.query_for_quota_alert") File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1303, in call_sync return self.run_coroutine(self._call_worker(name, *prepared_call.args)) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1339, in run_coroutine return fut.result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1254, in _call_worker return await self.run_in_proc(main_worker, name, args, job) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1173, in run_in_proc return await self.run_in_executor(self.__procpool, method, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) RuntimeError: can't start new thread

Failed to check for alert UnencryptedDatasets: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/alert.py", line 740, in __run_source alerts = (await alert_source.check()) or [] File "/usr/local/lib/python3.9/site-packages/middlewared/alert/source/datasets.py", line 18, in check for dataset in await self.middleware.call('pool.dataset.query', [['encrypted', '=', True]]): File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call return await self._call( File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1251, in _call return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) File "/usr/local/lib/python3.9/site-packages/middlewared/utils/io_thread_pool_executor.py", line 43, in worker fut.set_result(fn(*args, **kwargs)) File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 985, in nf return f(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 2820, in query sys_config = self.middleware.call_sync('systemdataset.config') File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1299, in call_sync return self.run_coroutine(methodobj(*prepared_call.args)) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1339, in run_coroutine return fut.result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf return await f(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/service.py", line 385, in config return await self._get_or_insert(self._config.datastore, options) File "/usr/local/lib/python3.9/site-packages/middlewared/service.py", line 397, in _get_or_insert return await self.middleware.call('datastore.config', datastore, options) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call return await self._call( File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1240, in _call return await methodobj(*prepared_call.args) File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf return await f(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 186, in config return await self.query(name, [], options) File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf return await f(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 164, in query result = await self._queryset_serialize( File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 214, in _queryset_serialize result.append(await self._serialize( File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 232, in _serialize data = await self.middleware.call(extend, data) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call return await self._call( File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1240, in _call return await methodobj(*prepared_call.args) File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/sysdataset.py", line 55, in config_extend licensed = await self.middleware.call('failover.licensed') File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call return await self._call( File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1251, in _call return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) File "/usr/local/lib/python3.9/site-packages/middlewared/utils/io_thread_pool_executor.py", line 43, in worker fut.set_result(fn(*args, **kwargs)) File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 985, in nf return f(*args, **kwargs) File "/usr/local/lib/middlewared_truenas/plugins/failover.py", line 192, in licensed info = self.middleware.call_sync('system.info') File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1299, in call_sync return self.run_coroutine(methodobj(*prepared_call.args)) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1339, in run_coroutine return fut.result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf return await f(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/system.py", line 655, in info dmidecode = await self.middleware.call('system.dmidecode_info') File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call return await self._call( File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1251, in _call return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args) File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) File "/usr/local/lib/python3.9/asyncio/base_events.py", line 819, in run_in_executor executor.submit(func, *args), loop=self) File "/usr/local/lib/python3.9/site-packages/middlewared/utils/io_thread_pool_executor.py", line 35, in submit start_daemon_thread(name=f"ExtraIoThread_{next(counter)}", target=worker, args=(fut, fn, args, kwargs)) File "/usr/local/lib/python3.9/site-packages/middlewared/utils/io_thread_pool_executor.py", line 18, in start_daemon_thread t.start() File "/usr/local/lib/python3.9/threading.py", line 899, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread

And while I could still access the disks tab, I saw that the SMART test for a drive that previously worked also failed.

Does anyone have any idea what’s going on here? Why is the GUI so badly affected by this error and why can’t I even start a SMART test to begin with?

My system:
Hypervisor: UNRAID
VM: TrueNAS TrueNAS-13.0-U6.1
DISKS: 10x 4tb Iron Wolf (2x 5 disk RadiZ2 ) & 6x 2tb Samsung SSD (3x mirrors) distributed on two controllers

Controller #1: LSI9300-16i HBA looped through to VM (only half is looped through, the controller is actually two 8i controllers) with the recommended TrueNAS firmware 16.00.12.00 (8x 4tb HDD)

#2 Controller: Mainboard SATA from a Pro WS W680-ACE with the 6 SSDs and two more HDDs (also routed through)

Before that TrueNAS was actually running smoothly and I even had transfer rates of over 2Gb/s from local Windows VMs, but now, I’m not sure if the software, virtualization or the drives are the problem. I remembered having problems with the da7 drive in the past, but as far as I know the names can change and I didn’t record the drive ID unfortunately. Ignoring that the UI should work normally even with a failed drive.

Does anyone have any ideas on how to troubleshoot/fix this?
I would start with swapping drive bays/ potentially new drives but without even seeing able to execute a SMART check this seems to be a dead end.

Thanks for any reply.

Seems like an odd choice, but whatever.

Let’s work from the basics:

What’s the output of camcontrol devlist and smartctl -x /dev/da7? Does smartctl -t long work as expected?

1 Like

I’m not a professional IT guy and Proxmox scared me off because it’s more for clustering and even basic things like CPU pinning (which I need) had to be done in the console when I explored it 2 years ago. Hypervisor is KVM, Unraid per se is incorrectly expressed here, my bad. As far as I know, TrueNAS and Unraid is a pretty common combination for home users in home labs. The other stuff that has monthly plans like ESXi and VMware is more commonly used in enterprise, at least I’ve found rather less about those syteme for home use. What do you typically see as hypervisors in the community? Not off-topic anymore, I was just curious :stuck_out_tongue:

So here is camcontrol devlist output:

root@truenas[~]# clear
root@truenas[~]# camcontrol devlist
<Samsung SSD 870 EVO 2TB SVT03B6Q>  at scbus2 target 0 lun 0 (ada0,pass0)
<Samsung SSD 870 EVO 2TB SVT03B6Q>  at scbus3 target 0 lun 0 (ada1,pass1)
<Samsung SSD 870 EVO 2TB SVT03B6Q>  at scbus4 target 0 lun 0 (ada2,pass2)
<Samsung SSD 870 EVO 2TB SVT03B6Q>  at scbus5 target 0 lun 0 (ada3,pass3)
<ST4000VN006-3CW104 SC60>          at scbus6 target 0 lun 0 (ada4,pass4)
<ST4000VN006-3CW104 SC60>          at scbus7 target 0 lun 0 (ada5,pass5)
<Samsung SSD 870 EVO 2TB SVT03B6Q>  at scbus8 target 0 lun 0 (ada6,pass6)
<Samsung SSD 870 EVO 2TB SVT03B6Q>  at scbus9 target 0 lun 0 (ada7,pass7)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus10 target 0 lun 0 (ses0,pass8)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 0 lun 0 (pass9,da0)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 1 lun 0 (pass10,da1)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 2 lun 0 (pass11,da2)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 3 lun 0 (pass12,da3)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 4 lun 0 (pass13,da4)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 5 lun 0 (pass14,da5)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 6 lun 0 (pass15,da6)
<ATA ST4000VN006-3CW1 SC60>        at scbus11 target 7 lun 0 (pass16,da7)
root@truenas[~]# 

Here are the three smartctls from da5, da6 and da7, while da6 was not making any problems:

root@truenas[~]# smartctl -x /dev/da5
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN006-3CW104
Serial Number:    ZW60RHSE
LU WWN Device Id: 5 000c50 0e6e64019
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Apr  9 22:04:49 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 463) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   080   064   006    -    98621377
  3 Spin_Up_Time            PO----   096   095   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    178
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   068   060   045    -    6488698
  9 Power_On_Hours          -O--CK   098   098   000    -    2194
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    178
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   099   000    -    1
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   059   053   040    -    41 (Min/Max 40/44)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    157
193 Load_Cycle_Count        -O--CK   100   100   000    -    270
194 Temperature_Celsius     -O---K   041   047   000    -    41 (0 25 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   080   064   000    -    98621377
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    1144 (20 49 0)
241 Total_LBAs_Written      ------   100   253   000    -    4634108168
242 Total_LBAs_Read         ------   100   253   000    -    2547613389
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    512  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      24  Device vendor specific log
0xa2       GPL     VS    8160  Device vendor specific log
0xa6       GPL     VS     192  Device vendor specific log
0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    9048  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      16  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS      24  Device vendor specific log
0xd1       GPL     VS     264  Device vendor specific log
0xd3       GPL     VS    1920  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2188         -
# 2  Extended offline    Interrupted (host reset)      00%      2180         -
# 3  Short offline       Completed without error       00%      2178         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    41 Celsius
Power Cycle Min/Max Temperature:     40/44 Celsius
Lifetime    Min/Max Temperature:     25/47 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         3 minutes
Temperature Logging Interval:        94 minutes
Min/Max recommended Temperature:      1/61 Celsius
Min/Max Temperature Limit:            2/60 Celsius
Temperature History Size (Index):    128 (41)

Index    Estimated Time   Temperature Celsius
  42    2024-04-01 13:34     ?  -
  43    2024-04-01 15:08    34  ***************
  44    2024-04-01 16:42     ?  -
  45    2024-04-01 18:16    37  ******************
  46    2024-04-01 19:50     ?  -
  47    2024-04-01 21:24    37  ******************
  48    2024-04-01 22:58     ?  -
  49    2024-04-02 00:32    43  ************************
  50    2024-04-02 02:06     ?  -
  51    2024-04-02 03:40    43  ************************
  52    2024-04-02 05:14     ?  -
  53    2024-04-02 06:48    25  ******
  54    2024-04-02 08:22     ?  -
  55    2024-04-02 09:56    25  ******
  56    2024-04-02 11:30     ?  -
  57    2024-04-02 13:04    25  ******
  58    2024-04-02 14:38     ?  -
  59    2024-04-02 16:12    35  ****************
  60    2024-04-02 17:46     ?  -
  61    2024-04-02 19:20    38  *******************
  62    2024-04-02 20:54     ?  -
  63    2024-04-02 22:28    38  *******************
  64    2024-04-03 00:02     ?  -
  65    2024-04-03 01:36    38  *******************
  66    2024-04-03 03:10     ?  -
  67    2024-04-03 04:44    38  *******************
  68    2024-04-03 06:18     ?  -
  69    2024-04-03 07:52    38  *******************
  70    2024-04-03 09:26     ?  -
  71    2024-04-03 11:00    34  ***************
  72    2024-04-03 12:34     ?  -
  73    2024-04-03 14:08    38  *******************
  74    2024-04-03 15:42     ?  -
  75    2024-04-03 17:16    38  *******************
  76    2024-04-03 18:50     ?  -
  77    2024-04-03 20:24    37  ******************
  78    2024-04-03 21:58     ?  -
  79    2024-04-03 23:32    34  ***************
  80    2024-04-04 01:06     ?  -
  81    2024-04-04 02:40    38  *******************
  82    2024-04-04 04:14     ?  -
  83    2024-04-04 05:48    28  *********
  84    2024-04-04 07:22     ?  -
  85    2024-04-04 08:56    27  ********
  86    2024-04-04 10:30     ?  -
  87    2024-04-04 12:04    28  *********
  88    2024-04-04 13:38     ?  -
  89    2024-04-04 15:12    33  **************
  90    2024-04-04 16:46     ?  -
  91    2024-04-04 18:20    34  ***************
  92    2024-04-04 19:54     ?  -
  93    2024-04-04 21:28    36  *****************
  94    2024-04-04 23:02     ?  -
  95    2024-04-05 00:36    36  *****************
  96    2024-04-05 02:10     ?  -
  97    2024-04-05 03:44    36  *****************
  98    2024-04-05 05:18     ?  -
  99    2024-04-05 06:52    36  *****************
 100    2024-04-05 08:26     ?  -
 101    2024-04-05 10:00    37  ******************
 102    2024-04-05 11:34     ?  -
 103    2024-04-05 13:08    38  *******************
 104    2024-04-05 14:42     ?  -
 105    2024-04-05 16:16    39  ********************
 106    2024-04-05 17:50     ?  -
 107    2024-04-05 19:24    39  ********************
 108    2024-04-05 20:58     ?  -
 109    2024-04-05 22:32    39  ********************
 110    2024-04-06 00:06     ?  -
 111    2024-04-06 01:40    40  *********************
 112    2024-04-06 03:14     ?  -
 113    2024-04-06 04:48    38  *******************
 114    2024-04-06 06:22     ?  -
 115    2024-04-06 07:56    38  *******************
 116    2024-04-06 09:30     ?  -
 117    2024-04-06 11:04    39  ********************
 118    2024-04-06 12:38     ?  -
 119    2024-04-06 14:12    39  ********************
 120    2024-04-06 15:46     ?  -
 121    2024-04-06 17:20    39  ********************
 122    2024-04-06 18:54     ?  -
 123    2024-04-06 20:28    39  ********************
 124    2024-04-06 22:02     ?  -
 125    2024-04-06 23:36    33  **************
 126    2024-04-07 01:10    39  ********************
 127    2024-04-07 02:44    39  ********************
   0    2024-04-07 04:18    40  *********************
   1    2024-04-07 05:52    40  *********************
   2    2024-04-07 07:26    41  **********************
   3    2024-04-07 09:00    41  **********************
   4    2024-04-07 10:34    41  **********************
   5    2024-04-07 12:08    42  ***********************
 ...    ..(  3 skipped).    ..  ***********************
   9    2024-04-07 18:24    42  ***********************
  10    2024-04-07 19:58    41  **********************
 ...    ..(  4 skipped).    ..  **********************
  15    2024-04-08 03:48    41  **********************
  16    2024-04-08 05:22    40  *********************
  17    2024-04-08 06:56    39  ********************
  18    2024-04-08 08:30    39  ********************
  19    2024-04-08 10:04    40  *********************
  20    2024-04-08 11:38    40  *********************
  21    2024-04-08 13:12    41  **********************
 ...    ..(  6 skipped).    ..  **********************
  28    2024-04-09 00:10    41  **********************
  29    2024-04-09 01:44     ?  -
  30    2024-04-09 03:18    40  *********************
  31    2024-04-09 04:52    44  *************************
  32    2024-04-09 06:26     ?  -
  33    2024-04-09 08:00    43  ************************
  34    2024-04-09 09:34    44  *************************
  35    2024-04-09 11:08    44  *************************
  36    2024-04-09 12:42    43  ************************
  37    2024-04-09 14:16    43  ************************
  38    2024-04-09 15:50    42  ***********************
  39    2024-04-09 17:24    40  *********************
  40    2024-04-09 18:58    40  *********************
  41    2024-04-09 20:32    40  *********************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4             178  ---  Lifetime Power-On Resets
0x01  0x010  4            2194  ---  Power-on Hours
0x01  0x018  6      4634115056  ---  Logical Sectors Written
0x01  0x020  6        13201635  ---  Number of Write Commands
0x01  0x028  6      2550047458  ---  Logical Sectors Read
0x01  0x030  6         3594936  ---  Number of Read Commands
0x01  0x038  6               -  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4            1191  ---  Spindle Motor Power-on Hours
0x03  0x010  4            1148  ---  Head Flying Hours
0x03  0x018  4             270  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4             157  ---  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              41  ---  Current Temperature
0x05  0x010  1              41  ---  Average Short Term Temperature
0x05  0x018  1               -  ---  Average Long Term Temperature
0x05  0x020  1              47  ---  Highest Temperature
0x05  0x028  1              28  ---  Lowest Temperature
0x05  0x030  1              45  ---  Highest Average Short Term Temperature
0x05  0x038  1              33  ---  Lowest Average Short Term Temperature
0x05  0x040  1               -  ---  Highest Average Long Term Temperature
0x05  0x048  1               -  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4           79656  ---  Number of Hardware Resets
0x06  0x010  4           39751  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

root@truenas[~]#                                                                                                        

Last night’s smartctl all seem to have worked and I just started another one. The failed ones were interrupted by a shutdown, and I think more runs for da4 appeared that were done before the last boot when I clicked the button a couple times but the system didn’t respond.


I hope this was maybe a one time bug, I will let you know when I reproduced it. Any opinions on the drives ? Could they still be faulty ?

Hit the character limit, here the other smartctls:

root@truenas[~]# smartctl -x /dev/da6
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN006-3CW104
Serial Number:    ZW60REX6
LU WWN Device Id: 5 000c50 0e6e6667f
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Apr  9 22:05:48 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 449) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   080   064   006    -    98630163
  3 Spin_Up_Time            PO----   095   095   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    145
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   068   060   045    -    6357903
  9 Power_On_Hours          -O--CK   098   098   000    -    2067
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    134
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   098   000    -    4
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   059   053   040    -    41 (Min/Max 40/44)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    106
193 Load_Cycle_Count        -O--CK   100   100   000    -    209
194 Temperature_Celsius     -O---K   041   047   000    -    41 (0 25 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   080   064   000    -    98630163
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    1021 (175 140 0)
241 Total_LBAs_Written      ------   100   253   000    -    4463958880
242 Total_LBAs_Read         ------   100   253   000    -    2449082669
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    512  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      24  Device vendor specific log
0xa2       GPL     VS    8160  Device vendor specific log
0xa6       GPL     VS     192  Device vendor specific log
0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    9048  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      16  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS      24  Device vendor specific log
0xd1       GPL     VS     264  Device vendor specific log
0xd3       GPL     VS    1920  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2061         -
# 2  Extended offline    Interrupted (host reset)      00%      2053         -
# 3  Short offline       Completed without error       00%      2051         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    41 Celsius
Power Cycle Min/Max Temperature:     40/44 Celsius
Lifetime    Min/Max Temperature:     25/47 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         3 minutes
Temperature Logging Interval:        94 minutes
Min/Max recommended Temperature:      1/61 Celsius
Min/Max Temperature Limit:            2/60 Celsius
Temperature History Size (Index):    128 (53)

Index    Estimated Time   Temperature Celsius
  54    2024-04-01 13:34    34  ***************
  55    2024-04-01 15:08     ?  -
  56    2024-04-01 16:42    37  ******************
  57    2024-04-01 18:16     ?  -
  58    2024-04-01 19:50    38  *******************
  59    2024-04-01 21:24     ?  -
  60    2024-04-01 22:58    43  ************************
  61    2024-04-02 00:32     ?  -
  62    2024-04-02 02:06    43  ************************
  63    2024-04-02 03:40     ?  -
  64    2024-04-02 05:14    25  ******
  65    2024-04-02 06:48     ?  -
  66    2024-04-02 08:22    25  ******
  67    2024-04-02 09:56     ?  -
  68    2024-04-02 11:30    25  ******
  69    2024-04-02 13:04     ?  -
  70    2024-04-02 14:38    35  ****************
  71    2024-04-02 16:12     ?  -
  72    2024-04-02 17:46    38  *******************
  73    2024-04-02 19:20     ?  -
  74    2024-04-02 20:54    38  *******************
  75    2024-04-02 22:28     ?  -
  76    2024-04-03 00:02    38  *******************
  77    2024-04-03 01:36     ?  -
  78    2024-04-03 03:10    38  *******************
  79    2024-04-03 04:44     ?  -
  80    2024-04-03 06:18    38  *******************
  81    2024-04-03 07:52     ?  -
  82    2024-04-03 09:26    35  ****************
  83    2024-04-03 11:00     ?  -
  84    2024-04-03 12:34    38  *******************
  85    2024-04-03 14:08     ?  -
  86    2024-04-03 15:42    38  *******************
  87    2024-04-03 17:16     ?  -
  88    2024-04-03 18:50    37  ******************
  89    2024-04-03 20:24     ?  -
  90    2024-04-03 21:58    34  ***************
  91    2024-04-03 23:32     ?  -
  92    2024-04-04 01:06    37  ******************
  93    2024-04-04 02:40     ?  -
  94    2024-04-04 04:14    29  **********
  95    2024-04-04 05:48     ?  -
  96    2024-04-04 07:22    27  ********
  97    2024-04-04 08:56     ?  -
  98    2024-04-04 10:30    28  *********
  99    2024-04-04 12:04     ?  -
 100    2024-04-04 13:38    33  **************
 101    2024-04-04 15:12     ?  -
 102    2024-04-04 16:46    34  ***************
 103    2024-04-04 18:20     ?  -
 104    2024-04-04 19:54    36  *****************
 105    2024-04-04 21:28     ?  -
 106    2024-04-04 23:02    36  *****************
 107    2024-04-05 00:36     ?  -
 108    2024-04-05 02:10    36  *****************
 109    2024-04-05 03:44     ?  -
 110    2024-04-05 05:18    36  *****************
 111    2024-04-05 06:52     ?  -
 112    2024-04-05 08:26    36  *****************
 113    2024-04-05 10:00     ?  -
 114    2024-04-05 11:34    38  *******************
 115    2024-04-05 13:08     ?  -
 116    2024-04-05 14:42    39  ********************
 117    2024-04-05 16:16     ?  -
 118    2024-04-05 17:50    39  ********************
 119    2024-04-05 19:24     ?  -
 120    2024-04-05 20:58    39  ********************
 121    2024-04-05 22:32     ?  -
 122    2024-04-06 00:06    40  *********************
 123    2024-04-06 01:40     ?  -
 124    2024-04-06 03:14    38  *******************
 125    2024-04-06 04:48     ?  -
 126    2024-04-06 06:22    38  *******************
 127    2024-04-06 07:56     ?  -
   0    2024-04-06 09:30    39  ********************
   1    2024-04-06 11:04     ?  -
   2    2024-04-06 12:38    39  ********************
   3    2024-04-06 14:12     ?  -
   4    2024-04-06 15:46    39  ********************
   5    2024-04-06 17:20     ?  -
   6    2024-04-06 18:54    39  ********************
   7    2024-04-06 20:28     ?  -
   8    2024-04-06 22:02    33  **************
   9    2024-04-06 23:36    39  ********************
  10    2024-04-07 01:10    39  ********************
  11    2024-04-07 02:44    40  *********************
  12    2024-04-07 04:18    40  *********************
  13    2024-04-07 05:52    41  **********************
  14    2024-04-07 07:26    41  **********************
  15    2024-04-07 09:00    41  **********************
  16    2024-04-07 10:34    42  ***********************
 ...    ..(  3 skipped).    ..  ***********************
  20    2024-04-07 16:50    42  ***********************
  21    2024-04-07 18:24    41  **********************
 ...    ..(  4 skipped).    ..  **********************
  26    2024-04-08 02:14    41  **********************
  27    2024-04-08 03:48    40  *********************
  28    2024-04-08 05:22    39  ********************
  29    2024-04-08 06:56    40  *********************
  30    2024-04-08 08:30    40  *********************
  31    2024-04-08 10:04    41  **********************
  32    2024-04-08 11:38    41  **********************
  33    2024-04-08 13:12    41  **********************
  34    2024-04-08 14:46    42  ***********************
  35    2024-04-08 16:20    41  **********************
 ...    ..(  4 skipped).    ..  **********************
  40    2024-04-09 00:10    41  **********************
  41    2024-04-09 01:44     ?  -
  42    2024-04-09 03:18    40  *********************
  43    2024-04-09 04:52    44  *************************
  44    2024-04-09 06:26     ?  -
  45    2024-04-09 08:00    44  *************************
  46    2024-04-09 09:34    44  *************************
  47    2024-04-09 11:08    44  *************************
  48    2024-04-09 12:42    43  ************************
  49    2024-04-09 14:16    43  ************************
  50    2024-04-09 15:50    42  ***********************
  51    2024-04-09 17:24    41  **********************
  52    2024-04-09 18:58    41  **********************
  53    2024-04-09 20:32    40  *********************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4             134  ---  Lifetime Power-On Resets
0x01  0x010  4            2067  ---  Power-on Hours
0x01  0x018  6      4463963984  ---  Logical Sectors Written
0x01  0x020  6        13012029  ---  Number of Write Commands
0x01  0x028  6      2451520300  ---  Logical Sectors Read
0x01  0x030  6         3148100  ---  Number of Read Commands
0x01  0x038  6               -  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4            1082  ---  Spindle Motor Power-on Hours
0x03  0x010  4            1025  ---  Head Flying Hours
0x03  0x018  4             209  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4             106  ---  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               4  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              41  ---  Current Temperature
0x05  0x010  1              41  ---  Average Short Term Temperature
0x05  0x018  1               -  ---  Average Long Term Temperature
0x05  0x020  1              47  ---  Highest Temperature
0x05  0x028  1              27  ---  Lowest Temperature
0x05  0x030  1              46  ---  Highest Average Short Term Temperature
0x05  0x038  1              39  ---  Lowest Average Short Term Temperature
0x05  0x040  1               -  ---  Highest Average Long Term Temperature
0x05  0x048  1               -  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4           49241  ---  Number of Hardware Resets
0x06  0x010  4           24450  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

root@truenas[~]#                                                                                                        
root@truenas[~]# smartctl -x /dev/da7
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST4000VN006-3CW104
Serial Number:    ZW60REWA
LU WWN Device Id: 5 000c50 0e6e66c29
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Apr  9 22:06:35 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 456) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   075   064   006    -    28424534
  3 Spin_Up_Time            PO----   096   096   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    179
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   068   060   045    -    6624357
  9 Power_On_Hours          -O--CK   098   098   000    -    2194
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    179
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   001   000    -    4295042434
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   060   053   040    -    40 (Min/Max 40/44)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    158
193 Load_Cycle_Count        -O--CK   100   100   000    -    271
194 Temperature_Celsius     -O---K   040   047   000    -    40 (0 25 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   075   064   000    -    28424534
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    1144 (126 110 0)
241 Total_LBAs_Written      ------   100   253   000    -    4634216888
242 Total_LBAs_Read         ------   100   253   000    -    2547547416
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    512  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      24  Device vendor specific log
0xa2       GPL     VS    8160  Device vendor specific log
0xa6       GPL     VS     192  Device vendor specific log
0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    9048  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      16  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS      24  Device vendor specific log
0xd1       GPL     VS     264  Device vendor specific log
0xd3       GPL     VS    1920  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2188         -
# 2  Extended offline    Interrupted (host reset)      00%      2180         -
# 3  Short offline       Completed without error       00%      2178         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    40 Celsius
Power Cycle Min/Max Temperature:     40/44 Celsius
Lifetime    Min/Max Temperature:     25/47 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         3 minutes
Temperature Logging Interval:        94 minutes
Min/Max recommended Temperature:      1/61 Celsius
Min/Max Temperature Limit:            2/60 Celsius
Temperature History Size (Index):    128 (42)

Index    Estimated Time   Temperature Celsius
  43    2024-04-01 15:08     ?  -
  44    2024-04-01 16:42    34  ***************
  45    2024-04-01 18:16     ?  -
  46    2024-04-01 19:50    36  *****************
  47    2024-04-01 21:24     ?  -
  48    2024-04-01 22:58    37  ******************
  49    2024-04-02 00:32     ?  -
  50    2024-04-02 02:06    43  ************************
  51    2024-04-02 03:40     ?  -
  52    2024-04-02 05:14    43  ************************
  53    2024-04-02 06:48     ?  -
  54    2024-04-02 08:22    25  ******
  55    2024-04-02 09:56     ?  -
  56    2024-04-02 11:30    25  ******
  57    2024-04-02 13:04     ?  -
  58    2024-04-02 14:38    25  ******
  59    2024-04-02 16:12     ?  -
  60    2024-04-02 17:46    35  ****************
  61    2024-04-02 19:20     ?  -
  62    2024-04-02 20:54    38  *******************
  63    2024-04-02 22:28     ?  -
  64    2024-04-03 00:02    38  *******************
  65    2024-04-03 01:36     ?  -
  66    2024-04-03 03:10    38  *******************
  67    2024-04-03 04:44     ?  -
  68    2024-04-03 06:18    38  *******************
  69    2024-04-03 07:52     ?  -
  70    2024-04-03 09:26    38  *******************
  71    2024-04-03 11:00     ?  -
  72    2024-04-03 12:34    34  ***************
  73    2024-04-03 14:08     ?  -
  74    2024-04-03 15:42    38  *******************
  75    2024-04-03 17:16     ?  -
  76    2024-04-03 18:50    38  *******************
  77    2024-04-03 20:24     ?  -
  78    2024-04-03 21:58    36  *****************
  79    2024-04-03 23:32     ?  -
  80    2024-04-04 01:06    33  **************
  81    2024-04-04 02:40     ?  -
  82    2024-04-04 04:14    38  *******************
  83    2024-04-04 05:48     ?  -
  84    2024-04-04 07:22    28  *********
  85    2024-04-04 08:56     ?  -
  86    2024-04-04 10:30    27  ********
  87    2024-04-04 12:04     ?  -
  88    2024-04-04 13:38    27  ********
  89    2024-04-04 15:12     ?  -
  90    2024-04-04 16:46    33  **************
  91    2024-04-04 18:20     ?  -
  92    2024-04-04 19:54    33  **************
  93    2024-04-04 21:28     ?  -
  94    2024-04-04 23:02    36  *****************
  95    2024-04-05 00:36     ?  -
  96    2024-04-05 02:10    36  *****************
  97    2024-04-05 03:44     ?  -
  98    2024-04-05 05:18    36  *****************
  99    2024-04-05 06:52     ?  -
 100    2024-04-05 08:26    36  *****************
 101    2024-04-05 10:00     ?  -
 102    2024-04-05 11:34    36  *****************
 103    2024-04-05 13:08     ?  -
 104    2024-04-05 14:42    38  *******************
 105    2024-04-05 16:16     ?  -
 106    2024-04-05 17:50    39  ********************
 107    2024-04-05 19:24     ?  -
 108    2024-04-05 20:58    39  ********************
 109    2024-04-05 22:32     ?  -
 110    2024-04-06 00:06    39  ********************
 111    2024-04-06 01:40     ?  -
 112    2024-04-06 03:14    40  *********************
 113    2024-04-06 04:48     ?  -
 114    2024-04-06 06:22    38  *******************
 115    2024-04-06 07:56     ?  -
 116    2024-04-06 09:30    38  *******************
 117    2024-04-06 11:04     ?  -
 118    2024-04-06 12:38    39  ********************
 119    2024-04-06 14:12     ?  -
 120    2024-04-06 15:46    39  ********************
 121    2024-04-06 17:20     ?  -
 122    2024-04-06 18:54    39  ********************
 123    2024-04-06 20:28     ?  -
 124    2024-04-06 22:02    39  ********************
 125    2024-04-06 23:36     ?  -
 126    2024-04-07 01:10    32  *************
 127    2024-04-07 02:44    39  ********************
   0    2024-04-07 04:18    40  *********************
   1    2024-04-07 05:52    40  *********************
   2    2024-04-07 07:26    41  **********************
   3    2024-04-07 09:00    41  **********************
   4    2024-04-07 10:34    41  **********************
   5    2024-04-07 12:08    42  ***********************
 ...    ..(  2 skipped).    ..  ***********************
   8    2024-04-07 16:50    42  ***********************
   9    2024-04-07 18:24    41  **********************
 ...    ..(  4 skipped).    ..  **********************
  14    2024-04-08 02:14    41  **********************
  15    2024-04-08 03:48    40  *********************
  16    2024-04-08 05:22    39  ********************
  17    2024-04-08 06:56    39  ********************
  18    2024-04-08 08:30    40  *********************
  19    2024-04-08 10:04    40  *********************
  20    2024-04-08 11:38    41  **********************
 ...    ..(  5 skipped).    ..  **********************
  26    2024-04-08 21:02    41  **********************
  27    2024-04-08 22:36    40  *********************
  28    2024-04-09 00:10    40  *********************
  29    2024-04-09 01:44    40  *********************
  30    2024-04-09 03:18     ?  -
  31    2024-04-09 04:52    40  *********************
  32    2024-04-09 06:26    44  *************************
  33    2024-04-09 08:00     ?  -
  34    2024-04-09 09:34    43  ************************
  35    2024-04-09 11:08    44  *************************
  36    2024-04-09 12:42    44  *************************
  37    2024-04-09 14:16    43  ************************
  38    2024-04-09 15:50    43  ************************
  39    2024-04-09 17:24    42  ***********************
  40    2024-04-09 18:58    40  *********************
  41    2024-04-09 20:32    40  *********************
  42    2024-04-09 22:06    40  *********************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4             179  ---  Lifetime Power-On Resets
0x01  0x010  4            2194  ---  Power-on Hours
0x01  0x018  6      4634210296  ---  Logical Sectors Written
0x01  0x020  6        13415147  ---  Number of Write Commands
0x01  0x028  6      2549974973  ---  Logical Sectors Read
0x01  0x030  6         3930767  ---  Number of Read Commands
0x01  0x038  6               -  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4            1191  ---  Spindle Motor Power-on Hours
0x03  0x010  4            1148  ---  Head Flying Hours
0x03  0x018  4             271  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4             158  ---  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4            9689  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              40  ---  Current Temperature
0x05  0x010  1              41  ---  Average Short Term Temperature
0x05  0x018  1               -  ---  Average Long Term Temperature
0x05  0x020  1              47  ---  Highest Temperature
0x05  0x028  1              28  ---  Lowest Temperature
0x05  0x030  1              45  ---  Highest Average Short Term Temperature
0x05  0x038  1              34  ---  Lowest Average Short Term Temperature
0x05  0x040  1               -  ---  Highest Average Long Term Temperature
0x05  0x048  1               -  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4           79669  ---  Number of Hardware Resets
0x06  0x010  4           39596  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

root@truenas[~]#                                                                                                        


No, the disks look fine. Not sure what the error was all about, clearly they run long tests just fine…

I would guess there is something wrong with how the drives are presented to TN.
Resource - "Absolutely must virtualize TrueNAS!" ... a guide to not completely losing your data. | TrueNAS Community states you want to have Type 1 hypervisor… now, I do not know UNRAID but I have never seen it being used as such: generally we see VMware, XCP-ng, and Proxmox.

1 Like

What’s the output of lspci?

1 Like
lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 05)
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:04.0 Communication controller: Red Hat, Inc. Virtio console
00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:06.0 SATA controller: Intel Corporation Device 7ae2 (rev 11)
00:07.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:07.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:07.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:07.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:08.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
00:09.0 Non-Volatile memory controller: Intel Corporation Optane SSD 900P Series
00:0a.0 Non-Volatile memory controller: Intel Corporation Optane SSD 900P Series
root@truenas[~]# 

Thanks, I went through it and the article’s suggestions are actually pretty interesting. Unfortunately it seems that the developers mostly considered ESXi, Proxmox or KVM in general was just mentioned quickly. Is there any documentation/testing why it has to be a Type 1 Hypervisor ? Would be nice to know where the root for the decision comes from, because this seems to be a big discussion in other communities if Type 1 or 2 is better in general or for TN vms in particular.

I know you are the experts, but the article (even with the updates from this year) seems to be out of date a little bit. When I go to the old TN forum there is even the banner on top that shows the article about virtualization. Neglecting HyperV (because it’s mostly interesting for Windows) KVM makes up 60% of virtualization and a Type I, which would be Xen, only makes up slightly about 10% of all commercialized VM tasks. Why would TN, continuing on supporting running virtualization, optimize the system for a Hypervisor type that (even commercially) is “barely” used, to put it simple.

Interesting topic still, the use of TN for ZFS in Proxmox and Unraid is pretty famous and there are tons of tutorials from all the Tech YouTube/blog guys out there. I just hope that I can get this fixed to a point where I can rely on the system for daily use (with a backup obviously). Would be a bummer to find out that I have to change to another Hypervisor spending well over a year on KVM.

BTW: let me know if you need hardware/BIOS specs. I made bad encounters with certain BIOS settings and was always afraid of any PCIe power management settings that could screw up virtualization. Unfortunately I never could reproduce anything reliably to make a statement there. Maybe you know certain BIOS settings that TN is “allergic” to (also considering the LSI HBA).

The key to success with virtualizing TrueNAS is to PCIe pass through the disk controller that will be used for the data drives. Be it integrated sata or a SAS HBA etc

Type 1 type 2, well modern hupervisors are sort of a hybrid anyway. I think the point is not to use an app running on windows or macOS, but rather a kernel hypervisor.

Oh, like Kernel Virtual Machine, or KVM.

Anyway, one wonders why you’d bother running TrueNAS on top of unraid.

TrueNAS has VM support and the benefit is the VMs are then on top of the storage, rather than the other way round where you end up back linking the storage to the hypervisor over slower protocols!

3 Likes

Type 1 hypervisors run directly on the host’s physical hardware, whereas Type 2 hypervisors are installed on top of an OS… meaning 1 is closer to the hardware than 2, which historically has been proved to cause less trouble. For the same reson, type 1 is generally considered safer than type 2.

Yeah, but it’s blurred.

ESXi is Type 1. But it’s based on Linux. So is it? Is it running on bare metal? Or is it running on Linux?

Promox is based on Debian… is it Type 1?

TrueNAS SCALE is also based on Debian… is it Type 1?

Oh, they’re kernel hyper visors.

So what about HyperV? That’s running on windows… but it’s a kernel hypervisor too…

1 Like

And Windows these days sometimes shifts itself to run inside Hyper-V side-by-side with what the user perceives to be the VM. Blurry lines abound.

3 Likes

As far as I know KVM is type 2. Btw, found this article @blacklight Type 1 vs Type 2 Hypervisors - Difference Between Hypervisor Types - AWS

But why? KVM is running on bare metal. It’s a Kernel implementation.

QEMU is then using it.

So, Proxmox is type 2 then…

Or is it type 1?

But I think it’s fair to say that VirtualBox is Type 2, and that is what should be avoided.

KVM is Type 1 because the KVM layer is below the LinuxOS layer.

https://ubuntu.com/blog/kvm-hyphervisor

The AWS article lists ESXi, HyperV and KVM as Type 1 Hypervisors.

2 Likes

Thanks for all the feedback and information folks.

I actually decided it because of multiple reasons:

  • I started with TN Core on bare metal on a old system and was stuck on virtualization after just a bunch of weeks because TN Core wasn’t offering the Hypervisor capabilities (Scale as a Hypervisor wasn’t a big thing 2 years ago, as far as I know)
  • I saw that I needed much more performance and functionality for my home lab
  • I didn’t know what hardware to choose. Best case would be a xeon rack with U.2 SSDs all across the machine … on the other end of the spectrum just a raspberry running TN … I had no idea what I need and since I can not afford dedicated hardware (Xeon, Server mobo etc.), I didn’t know what to look for
  • because of rising electricity prices and space constraints my goal was to only have one main home lab machine combining performance and a NAS
  • I wanted to play with differing resources for the NAS to watch the effects long term (thats why I wanted CPU pinning and a hypervisor underneath)
  • from Tutorials that I saw from multiple YouTubers and home lab bloggers I saw that it works well … at least for them … I was never expecting that I would be the only exception on earth. I went for the best workstation hardware I could find (-> Asus W680 ACE → way less problems and more compatibility than Supermicro/Gigabyte workstation mobos) that still was consumer, because, as already said, I can’t afford todays Xeon CPUs …

… and now the only thing left is that I want to get the TN up and running reliably. The performance is already where I want it to be. I get over 2GB/s internal and over 1Gb/s over 10G lan without tweaking anything. AND managing my shares and disks in Truenas just is a dream … if there wouldn’t be occasional errors that I can’t reproduce.

So to get back on track … a new error:

Checked sas3flash tool again:

The drives are:

gptid/d2901fa1-e72b-11ee-9b38-23810fb24440     N/A  ada5p2
gptid/d25ed07c-e72b-11ee-9b38-23810fb24440     N/A  da1p2

So different drives are creating the problem right now. I already swapped the HBA slot from x4 chipset to x8 bifurcated … the only thing left next to swapping plates is swapping the HBA or at least the sata cables.
Unfortunately I work remote on the system from the US (system located in Germany) so it would be a challenge but well …

The error occurred while READING data over a NFS share from a nextcloud docker. I was actually opening pictures and then suddenly nextcloud stopped loading pictures. Another system was writing 100s of GBs to the server over night via SFTP over VPN without problems …

Can I benchmark the drives attached to the HBA somehow ? Also I don’t know if these are the ones attached over the HBA or the mobo. Is maybe the speed the problem ? SFTP was over the internet. The NFS share for nextcloud is local … any troubleshooting ideas ???

I also had problems from the Hypervisor (Unraid said): qemu was reporting a VFIO DMA MAP -22 error, but TN itself didn’t report anything and the system still was at full performance. That error caused freezes sometimes of the WHOLE SYSTEM and can be triggered occasionally by shutting down the TN vm. The errors never occurred at the same time thats why I think that they are not connected. Just wanted to mention them. While I have a lot of virtualization experts here, maybe one of you guys stumbled over analog problems ?

I was also considering switching the SCALE. That would change a lot, because we are not talking about a FreeBSD guest anymore. You think I should give that a try (even if it means not going back) ? Maybe “Linux” on “Linux” makes it better …

Thanks in advance

damn, and I thought I was over this …

Celebrated to early for this one …
Any ideas ?