TrueNAS Scale migration to another disk

Hi all, I posted the other week about my boot-pool disk reporting a failed sector. Fast forward and today I was having an issue, dug a little deeper and found the storage for TrueNAS was offline, it’s presented to Proxmox VM’s. So, I’ve powered off the server re-seated the disk and it has detected it again, I know this isn’t a permanent solution but at least enough to let me into Truenas.

When I logged in, there are no errors about the boot-pool with a bad sector so now I’m a little confused. I can’t run a SMART test on this drive either, it’s an Crucial MX500 250GB 2.5in SATA SSD CT250MX500SSD1, I’ve got a spare 128gb SSD here that I can migrate to.

What do I need to backup within Truenas to allow me to smoothly import my configuration to another disk? When I do an upgrade to scale, I also backup the config at that time would that suffice? Would I also be able to use the clone VM in Proxmox so that I don’t need to create another vm from scratch?

Thank you in advance.

Does that drive not support SMART?
What is the drive identifier (sda, sdb, etc.) Then from the Shell run smartctl -t long /dev/sda assuming sda is the drive ident.

I am confused, Your title is for Scale, yet you state you are doing an upgrade? I have to assume (Never let us assume) you are currently on a version CORE (what I don’t know but I will assume it is 13.0-U6.1).

Yes, you can use your config file to restore to CORE or SCALE for the most part.

I’m also confused as I think you are pushing a physical boot drive through Proxmox to boot the TrueNAS system. It sounds like you are not using Proxmox for anything more than to push through ALL the drives and letting it run. I suspect you are using the NIC but that is also an assumption. Why not create a virtual boot drive? Create a 16GB virtual drive to pass to the VM and use that vice a physical boot drive? Just asking. I run ESXi and create a virtual boot drive, it sure does make my life much easier. I can clone the boot drive, upgrade and if I don’t like it, roll it all back.

A note of caution: When you upgrade, DO NOT Upgrade your ZFS flags or you cannot roll back to the previous version.

If I try this I get the following, the disk is an Crucial MX500 250GB 3D NAND:
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.74-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Long (extended) offline self test failed [unsupported scsi opcode]

I had a few beers already, I should clarify I recent did an upgrade to Truenas Scale cobia from the previous update.

Yes, I am using Proxmox, it contains my VM’s (Truenas, pfsense and blueiris) and Truenas presents the storage to Proxmox. I had a dabble with VM’s on scale but wasn’t overly impressed with it.

I’ve got two SSD’s one for Proxmox and the other for Truenas. Are you suggesting that I put Truenas VM on the same disk as Proxmox?

All my storage in Truenas is passed through via a HBA that is flashed in IT mode.

Okay, so SCSI interface, so try this and maybe it will work.

smartctl -d sat -t long /dev/sda

Hopefully that will be the ticket for you.

Stop bragging!

I was suggesting that when you create your TrueNAS VM, you create a single VM disk of 16GB and pass through your data drives, then your virtual disk becomes your boor drive. If you are running VMs, you do this already, but this one would have your physical data drives to pass through. Your boot drive would be on Proxmox. Of course if you have something that works for you and you are good with it, don’t let me talk you out of it.

I tried this but it wasn’t successful:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.74-production+truenas] (local bu                                                                   ild)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Read Device Identity failed: scsi error unsupported scsi opcode

If this is a USB connected device, look at the various --device=TYPE variants
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

It is mid afternoon here, perfect time for another.

Yes, my Proxmox SSD is the main boot drive. Truenas I then gave it it’s own SSD so the two SSD’s are seperate. With Proxmox, could I clone my Truenas to a USB flash drive (temporarily) bring up Truenas on that and then move it to the spare SSD that I have?

Also, is this SSD Silicon Power A55 128GB-4TB SATA III 6Gb/s 2.5-inch Internal Solid State Drive okay or should I take this as an opportunity to utilize a better SSD that doesn’t use NAND technology?

I have one of those drives myself. For light duty they are fine. Always make sure you have a backup of your configuration file so that if the boot drive fails, restoration is very easy and somewhat fast.

There should be a setting to force the SMART test to run. I was hoping the SCSI to ATA Translation setting would do the trick. Can you pull a normal smart report ‘smartclt -a /dev/ada0’ or even -x ? If you cannot do anything, that might be helpful. Also, have you tried to setup the TrueNAS GUI to run a selftest?

Okay, I was able to get the information I did have to change it to sda but this is the output:

sudo smartctl -x /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.74-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               QEMU
Product:              QEMU HARDDISK
Revision:             2.5+
Compliance:           SPC-3
User Capacity:        250,059,350,016 bytes [250 GB]
Logical block size:   512 bytes
LU is thin provisioned, LBPRZ=0
Device type:          disk
Local Time is:        Sun Apr 28 15:35:13 2024 AEST
SMART support is:     Unavailable - device lacks SMART capability.
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

Device does not support Self Test logging
Device does not support Background scan results logging
Device does not support General statistics and performance logging

If I attempt a selftest via Storage, Disks, sda and Manual Test, short or long, I get:
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.74-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Long (extended) offline self test failed [unsupported scsi opcode]

Well glad you were able to get some sort of output, unfortunately to see that SMART support is: Unavailable.

Doing a quick search on the internet leads me to believe it is how you have your VM setup, passing through this drive. I can’t tell you if this is normal or not.

Are you passing through the entire controller or just the drive? More importantly is what about your data drives, can you run a SMART test on those? If you cannot then I would think you have an issue to solve, but you may be able to test them. If so then maybe figure out what the difference is between the two.

I’m passing the the HBA which has my storage connected to it.
The SSD on the Truenas VM looks like this in proxmox hard disk (scsi2) /dev/disk/by-id/ata-CT250MX500SSD1_2239E66E8FCD,size=244198584K

Bit of a crazy week, I managed to get the following from Proxmox for the disk:

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT250MX500SSD1
Serial Number:    2239E66E8FCD
LU WWN Device Id: 5 00a075 1e66e8fcd
Firmware Version: M3CR045
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat May  4 15:32:16 2024 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12263
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       41
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   077   077   000    Old_age   Always       -       301
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       23
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       49
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   072   049   000    Old_age   Always       -       28 (Min/Max 0/51)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   077   077   001    Old_age   Offline      -       23
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       15708296424
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       143250021
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       1008083555

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Completed [00% left] (0-65535)
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.