Issue with my Truenas Scal after powerloss

I agree that there is still stuff to do before this can be closed, but can you be explicit about what you think still needs to be done?

As the next step in this, can you please run sudo zpool status -v and post the results, and if they are clean you should probably run a scrub.

Also, I note from the SMART results you posted that you are only doing Short self tests, though you are doing them reasonably frequently (probably more frequently than necessary). For HDDs I would personally do a short test once per week, and a long test once per month. You can schedule them to run simultaneously on all drives because they are self contained - but do them at off-peak times and don’t do them at the same time as a scrub.

If you haven’t already done so, you should also implement @joeschmuck’s Multi-Report script.

Hi Protopia,

Thanks for your reply,

I’ve ran the script and after sucess, I rebooted my server. 1 drive was not detected or was not working. Also one of the new drives that I used for resilvering was getting disconnected so I used 6tb once instead which is connected and working, any ideas how I can remove the resilvering unavail drive.
Apart from this as you mentioned I will follow the smart test pattern that you mentioned and also remove that 4tb Seagate drive with SMR tech. Resilvering is taking ages😅

Below is the status.

admin@truenas[~]$ sudo zpool status -v
[sudo] password for admin:
  pool: WorkersDev04
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Mar 12 05:12:28 2025
        3.07T / 6.46T scanned at 745M/s, 857G / 6.46T issued at 203M/s
        202G resilvered, 12.97% done, 08:03:29 to go
config:

        NAME                                        STATE     READ WRITE CKSUM
        WorkersDev04                                DEGRADED     0     0     0
          raidz2-0                                  DEGRADED     0     0     0
            replacing-0                             DEGRADED     0     0     0
              531c0dfc-8fe3-42d8-812a-d5fc7142b9b6  ONLINE       0     0     0                                                                                                                                                                                                                                               (resilvering)
              8767114780025994078                   UNAVAIL      0     0     0                                                                                                                                                                                                                                               was /dev/disk/by-partuuid/53e0ba3b-0801-4485-b08a-df433ae01060
              d17f7ea3-9623-4476-8d24-eb8553064365  ONLINE       0     0     0                                                                                                                                                                                                                                               (resilvering)
            35aa855c-ffa1-491a-830c-4c867bc5c987    ONLINE       0     0     0                                                                                                                                                                                                                                               (resilvering)
            sdf1                                    ONLINE       0     0     0
            e4f4dca8-b41e-488f-bd12-92c595276c7d    ONLINE       0     0     0
            994bac5d-dbc6-4bd7-8592-295428c57b45    ONLINE       0     0     0                                                                                                                                                                                                                                               (resilvering)

errors: Permanent errors have been detected in the following files:

<I've removed the file location for security reasons. (IDK if they will come back after resilvering is done) But I have a backup of that files anyway. (what else could be done will it affect other good files as well)>

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:31 with 0 errors on Mon Mar 10 03:45:32 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb3    ONLINE       0     0     0
            sdh3    ONLINE       0     0     0

errors: No known data errors

Hmmm … I don’t quite understand the zpool status output. Comparing it to the previous output of zpool import what it seems to say is that:

  1. Partuuid 35aa855c-ffa1-491a-830c-4c867bc5c987 that was first reported as faulted, is resilvering as expected.

  2. We have a disk shown as 8767114780025994078 ... was /dev/disk/by-partuuid/53e0ba3b-0801-4485-b08a-df433ae01060 that is no longer available, however partuuid 53e0ba3b-0801-4485-b08a-df433ae01060 wasn’t in the zpool import at all, so I have no idea what this is or where it came from, but I suspect that it was a ZFS label that had previously been part of the pool but perhaps had been replaced at some previous time. I think, or perhaps hope, that this is not going to be any real issue. However, I do wonder whether this was a cause of the pool not importing in the first place.

  3. Then we have two partuuids apparently resilvering the same device: 531c0dfc-8fe3-42d8-812a-d5fc7142b9b6 which was an original partuuid and d17f7ea3-9623-4476-8d24-eb8553064365 which is a new one from somewhere. I am not sure how the same device could be shown as being resilvered twice, so I think we need to wait and see what happens after the resilver has finished.

  4. And we have one other partuuids being resilvered: 994bac5d-dbc6-4bd7-8592-295428c57b45.

So it looks like you have a RAIDZ2 which has resilvers on 3 of the 5 drives which is NOT a good sign when RAIDZ2 only allows for 2 drives to be lost. But then again, if the zfs labels are screwed up, perhaps the output of zpool status is also screwed up.

My advice is as follows:

Don’t reboot again for the moment unless you have to in order to reduce the risk of another pool import failure and to preserve the device name mappings. Whilst the resilver is still running, run the following commands (in some cases again) and post the results now:

  • sudo lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID again to give us the current device name mappings and post the results.
  • sudo zdb -l dev/sdX for each disk of this pool and sudo zdb -l dev/sdXn against each of the ZFS partitions for this pool as shown in this lsblk output to see what the zfs labels are.

Then wait until the resilver complete and run sudo zpool status -v and the same commands again and post the results. As mentioned don’t do anything else until we have all the above results and can see how consistent everything is.

P.S. In future please do NOT use your initiative to “reboot the server” or “use a 6TB drive” unless we advise you to do so, as it may make things worse. I have no idea what the status was after the import worked and before you rebooted, and no idea whether doing these things made anything worse - but if you had a working pool with all files visible immediately after the import and you don’t when the resilvering completes then it is quite possible that doing these two things will have lost you some (or possibly even all) of your data.

Hi Protopia,

Was a bit busy last few days.

Point 1, 2, 3 and 4 I think the data is back and I can access everything all fine after resilvering.

This is the latest zpool status (currently scrubbing)

admin@truenas[~]$ sudo zpool status
  pool: WorkersDev04
 state: ONLINE
  scan: scrub in progress since Fri Mar 14 10:26:35 2025
        2.07T / 6.46T scanned at 75.6G/s, 0B / 6.46T issued
        0B repaired, 0.00% done, no estimated completion time
config:

        NAME                                      STATE     READ WRITE CKSUM
        WorkersDev04                              ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            d17f7ea3-9623-4476-8d24-eb8553064365  ONLINE       0     0     0
            35aa855c-ffa1-491a-830c-4c867bc5c987  ONLINE       0     0     0
            d37dc96f-bc91-4510-9e48-654e2b37f409  ONLINE       0     0     0
            e4f4dca8-b41e-488f-bd12-92c595276c7d  ONLINE       0     0     0
            994bac5d-dbc6-4bd7-8592-295428c57b45  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:31 with 0 errors on Mon Mar 10 03:45:32 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb3    ONLINE       0     0     0
            sdh3    ONLINE       0     0     0

errors: No known data errors

lsblk is showing this.



admin@truenas[~]$ sudo lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
NAME        MODEL                   ROTA PTTYPE TYPE     START          SIZE PARTTYPENAME             PARTUUID
sda         ST4000VX007-2DT166         1 gpt    disk           4000787030016
└─sda1                                 1 gpt    part      4096 4000784056832 Solaris /usr & Apple ZFS 35aa855c-ffa1-491a-830c-4c867bc5c987
sdb         CONSISTENT SSD S6 128GB    0 gpt    disk            128035676160
├─sdb1                                 0 gpt    part        40       1048576 BIOS boot                ff5d717e-ccd3-40d0-9735-e578907f719a
├─sdb2                                 0 gpt    part      2088     536870912 EFI System               3146caa8-78ac-4e92-9a0c-e7e68ac394b7
├─sdb3                                 0 gpt    part  34605096  110317850112 Solaris /usr & Apple ZFS e018c874-4812-4b3a-9e6b-c1f0e4d198e3
└─sdb4                                 0 gpt    part   1050664   17179869184 Linux swap               37128fac-e8c0-4d8f-9122-b60af51d2220
  └─md127                              0        raid1            17162043392
    └─md127                            0        crypt            17162043392
sdc         WDC WD40PURZ-85TTDY0       1 gpt    disk           4000785948160
└─sdc1                                 1 gpt    part      4096 4000783008256 Solaris /usr & Apple ZFS 994bac5d-dbc6-4bd7-8592-295428c57b45
sdd         WDC WD40PURX-64NZ6Y0       1 gpt    disk           4000787030016
└─sdd1                                 1 gpt    part      4096 4000784056832 Solaris /usr & Apple ZFS e4f4dca8-b41e-488f-bd12-92c595276c7d
sde         WDC WD60PURZ-85ZUFY1       1 gpt    disk           6001175126016
└─sde1                                 1 gpt    part      4096 4000785105408 Solaris /usr & Apple ZFS d17f7ea3-9623-4476-8d24-eb8553064365
sdg         WDC WD40PURZ-85TTDY0       1 gpt    disk           4000787030016
└─sdg1                                 1 gpt    part      4096 4000784056832 Solaris /usr & Apple ZFS d37dc96f-bc91-4510-9e48-654e2b37f409
sdh         EVM25/128GB                0 gpt    disk            128035676160
├─sdh1                                 0 gpt    part      4096       1048576 BIOS boot                10f746b2-d9b8-4364-8428-52432573852b
├─sdh2                                 0 gpt    part      6144     536870912 EFI System               143e569e-c80e-4088-bf97-316a879dcc44
├─sdh3                                 0 gpt    part  34609152  110315773440 Solaris /usr & Apple ZFS 4ef9db8b-a7ba-4749-ab68-59ddb19d125d
└─sdh4                                 0 gpt    part   1054720   17179869184 Linux swap               9e9e064a-a34c-4e38-8de8-28ca2c078584
  └─md127                              0        raid1            17162043392
    └─md127                            0        crypt            17162043392

I tried the sudo zdb -l dev/sda b or other letters it didnt worked.

for the 6tb once its a quick one but yea I need to use a 4tb once. I found survellence drives quite cheap over here so I use that I hope they are fine (please let me know your thoughts) Nas once are double the double or priced more than the survellence drives.
Reboot😅 I quite often do it as we dont have stable power in here just scared of data loss what jsut happened with me was like a nightmare.
IDK if its the power supply or something the some drives automatically keeps disconnecting not sure if the sata power adapters or the drives itself or the power supply.
for the molex to sata splitter I bouth it from PI Plus website ( Pi+® (PiPlus®) Molex IDE 4Pin Male to 5 x SATA Power Cable-18AWG)
For power supply running 650 bronze from cooler master quite old one.
Ordered a new one the model is TUF-GAMING-550B.
Also I need a case as of now they are all loose some on the table and some inside the case. I dont have a proper case. Would DIY case works or should I look for a new one. I currently have a case in mind i.e.prolab ai838 (can support 10 drives) Its good but way to costly.

Thank you so much Truenas community and @Protopia,
P

Are some of these drives previously used in a storage system? I don’t think that there should md raid partitions on drives used in Truenas Scale. If they were used yo might want to remove the md raid superblocks. If Truenas sees there is a raid partition on two of the drives i it’s pool then it can/will mess things up upon boot.

As per my previous post do not reboot until we have checked the label status using the following commands:

  • sudo zdb -l /dev/sda1
  • sudo zdb -l /dev/sdc1
  • sudo zdb -l /dev/sdd1
  • sudo zdb -l /dev/sde1
  • sudo zdb -l /dev/sdg1

P.S. Not sure why there is no /dev/sdf - do you have an empty drive bay?

It could be residual partitions from the time when swap was enabled.

That was my guess too.

I would have to disagree somewhat as I believe there is another likely possibility.
The only time I have seen these md raid partitions in Scale is when previously used disks had a previous madam software raid on them. Such as disks used in a different storage system like QNAP, Synology, a security system, purchased used disks, etc.

I have 2 in service systems that came up (upgraded/migrated) through all the versions of scale to current. All the drives do have swap partitions (1024 as part 1 on each disk) as that was initially what the setup was in Scale. None of these drives currently have any md raid1 partitions.

Most of these drives did initially have md raid partitions as I reused disks from a couple of different storage systems. This presented all kinds of issues such as missing drives after reboot, drives that showed in lsblk but were not visible in Scale, etc.

Some of the affected disks would sometimes show up in the Scale GUI, but then I found the format in the disk setup of Scale would not clear the superblocks (as a safety measure I was told) and the disks were still an issue after a reboot even though they had been made members of a vdev and thus in a pool.

The correct permanent solution was to boot into a live linux environment, remove the superblocks on the affected drives, then zero the drives out. This removed all traces of the old software raid(s), and Scale could then properly set up the partitions on the drives it needs and not get tripped up with an old raid.

My install, currently still on Dragonfish-24.04.2.5, has swap and md127 partitions on all drives that were there from the beginning. I did not add those myself or tinkered with the swap in any way.

It started as an Angelfish in mid 2022.

Interesting. I started from bare metal scratch with Bluefin on one and Cobia on the other.

Hi Protopia,

Here are the commands that you mentioned I ran them.

  1. sda1
admin@truenas[~]$ sudo zdb -l /dev/sda1
[sudo] password for admin:
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'WorkersDev04'
    state: 0
    txg: 577162
    pool_guid: 15540060179200563292
    errata: 0
    hostid: 202221155
    hostname: 'truenas'
    top_guid: 1007581029064245633
    guid: 10950750391358531691
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 1007581029064245633
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 20003891445760
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 13362088514984072882
            path: '/dev/disk/by-partuuid/d17f7ea3-9623-4476-8d24-eb8553064365'
            whole_disk: 0
            DTL: 437
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 10950750391358531691
            path: '/dev/disk/by-partuuid/35aa855c-ffa1-491a-830c-4c867bc5c987'
            whole_disk: 0
            DTL: 3561
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 2064555734680183965
            path: '/dev/disk/by-partuuid/d37dc96f-bc91-4510-9e48-654e2b37f409'
            whole_disk: 0
            DTL: 687
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10370810081988863495
            path: '/dev/disk/by-partuuid/e4f4dca8-b41e-488f-bd12-92c595276c7d'
            whole_disk: 0
            DTL: 3559
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 4470646324313395436
            path: '/dev/disk/by-partuuid/994bac5d-dbc6-4bd7-8592-295428c57b45'
            whole_disk: 0
            DTL: 3558
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3

  1. sdc1
admin@truenas[~]$ sudo zdb -l /dev/sdc1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'WorkersDev04'
    state: 0
    txg: 577162
    pool_guid: 15540060179200563292
    errata: 0
    hostid: 202221155
    hostname: 'truenas'
    top_guid: 1007581029064245633
    guid: 4470646324313395436
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 1007581029064245633
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 20003891445760
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 13362088514984072882
            path: '/dev/disk/by-partuuid/d17f7ea3-9623-4476-8d24-eb8553064365'
            whole_disk: 0
            DTL: 437
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 10950750391358531691
            path: '/dev/disk/by-partuuid/35aa855c-ffa1-491a-830c-4c867bc5c987'
            whole_disk: 0
            DTL: 3561
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 2064555734680183965
            path: '/dev/disk/by-partuuid/d37dc96f-bc91-4510-9e48-654e2b37f409'
            whole_disk: 0
            DTL: 687
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10370810081988863495
            path: '/dev/disk/by-partuuid/e4f4dca8-b41e-488f-bd12-92c595276c7d'
            whole_disk: 0
            DTL: 3559
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 4470646324313395436
            path: '/dev/disk/by-partuuid/994bac5d-dbc6-4bd7-8592-295428c57b45'
            whole_disk: 0
            DTL: 3558
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3

  1. sdd1
admin@truenas[~]$ sudo zdb -l /dev/sdd1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'WorkersDev04'
    state: 0
    txg: 577162
    pool_guid: 15540060179200563292
    errata: 0
    hostid: 202221155
    hostname: 'truenas'
    top_guid: 1007581029064245633
    guid: 10370810081988863495
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 1007581029064245633
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 20003891445760
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 13362088514984072882
            path: '/dev/disk/by-partuuid/d17f7ea3-9623-4476-8d24-eb8553064365'
            whole_disk: 0
            DTL: 437
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 10950750391358531691
            path: '/dev/disk/by-partuuid/35aa855c-ffa1-491a-830c-4c867bc5c987'
            whole_disk: 0
            DTL: 3561
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 2064555734680183965
            path: '/dev/disk/by-partuuid/d37dc96f-bc91-4510-9e48-654e2b37f409'
            whole_disk: 0
            DTL: 687
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10370810081988863495
            path: '/dev/disk/by-partuuid/e4f4dca8-b41e-488f-bd12-92c595276c7d'
            whole_disk: 0
            DTL: 3559
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 4470646324313395436
            path: '/dev/disk/by-partuuid/994bac5d-dbc6-4bd7-8592-295428c57b45'
            whole_disk: 0
            DTL: 3558
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3

  1. sde1
admin@truenas[~]$ sudo zdb -l /dev/sde1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'WorkersDev04'
    state: 0
    txg: 577162
    pool_guid: 15540060179200563292
    errata: 0
    hostid: 202221155
    hostname: 'truenas'
    top_guid: 1007581029064245633
    guid: 13362088514984072882
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 1007581029064245633
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 20003891445760
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 13362088514984072882
            path: '/dev/disk/by-partuuid/d17f7ea3-9623-4476-8d24-eb8553064365'
            whole_disk: 0
            DTL: 437
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 10950750391358531691
            path: '/dev/disk/by-partuuid/35aa855c-ffa1-491a-830c-4c867bc5c987'
            whole_disk: 0
            DTL: 3561
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 2064555734680183965
            path: '/dev/disk/by-partuuid/d37dc96f-bc91-4510-9e48-654e2b37f409'
            whole_disk: 0
            DTL: 687
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10370810081988863495
            path: '/dev/disk/by-partuuid/e4f4dca8-b41e-488f-bd12-92c595276c7d'
            whole_disk: 0
            DTL: 3559
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 4470646324313395436
            path: '/dev/disk/by-partuuid/994bac5d-dbc6-4bd7-8592-295428c57b45'
            whole_disk: 0
            DTL: 3558
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3

  1. sdg1
admin@truenas[~]$ sudo zdb -l /dev/sdg1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'WorkersDev04'
    state: 0
    txg: 577162
    pool_guid: 15540060179200563292
    errata: 0
    hostid: 202221155
    hostname: 'truenas'
    top_guid: 1007581029064245633
    guid: 2064555734680183965
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 1007581029064245633
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 20003891445760
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 13362088514984072882
            path: '/dev/disk/by-partuuid/d17f7ea3-9623-4476-8d24-eb8553064365'
            whole_disk: 0
            DTL: 437
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 10950750391358531691
            path: '/dev/disk/by-partuuid/35aa855c-ffa1-491a-830c-4c867bc5c987'
            whole_disk: 0
            DTL: 3561
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 2064555734680183965
            path: '/dev/disk/by-partuuid/d37dc96f-bc91-4510-9e48-654e2b37f409'
            whole_disk: 0
            DTL: 687
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10370810081988863495
            path: '/dev/disk/by-partuuid/e4f4dca8-b41e-488f-bd12-92c595276c7d'
            whole_disk: 0
            DTL: 3559
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 4470646324313395436
            path: '/dev/disk/by-partuuid/994bac5d-dbc6-4bd7-8592-295428c57b45'
            whole_disk: 0
            DTL: 3558
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3

Hope this help.
BTW will I need to change ssd’s? I’ve a few 240gb spare sata SSD.

Regards,
P