Pool disappears and disks are unattached

I’m quite new to TrueNAS and Linux in general, but I’ve had a very serious problem

Sorry if i post in the incorrect area, im noob here and thanks in advance

I run TrueNAS-SCALE-23.10.2
CPU: i5-9400
RAM: 16gb ddr4
GPU: gt1030

Pool: 2x500GB disk and 1TB disk
It was configured in a raid5

Last night I was going to access my samba server when I saw that I couldn’t access it, I thought the server was off but I saw it on, I went to the panel and it said that there was no pool created, in the storage dashboard section, my pool appeared, but in “offline VDEVs” it appeared, I kept looking and I saw that someone solved it by exporting the pool and then importing it, that didn’t work for me, I get the error “[EZFS_IO] Failed to import ‘data’ pool: cannot import ‘data’ as ‘data’: I/O error” trying more commands I remember that one told me that the pool metadata was corrupt and sent me to an OpenZFS link talking about the error “ZFS-8000-72” I have not been able to do much more, I need to recover the data since there are very important memories for me.

  1. When you say RAID5, do you mean RAIDZ1?

  2. Problem have been reported with the very latest point releases of 24.04 and 24.10, but not as far as I am aware with 23.10.2.

  3. The error message you got was probably from running sudo zpool import and TBH it is NOT a good sign. If your pool has metadata errors, the chances are that you will need to destroy and recreate it and then restore from backup. That said, it may be possible to revert to an earlier commit point or revert to an earlier snapshot for the datasets that have metadata errors or (most likely) to mount it read-only so you can copy your data elsewhere. But we will need to understand the details of your issue before we can help you.

  4. Please run the following diagnostic commands and copy and paste the results (in between lines containing just ``` which will preserve the formatting):

  • cli -c "system version"
  • lsblk -bo NAME,MODEL,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
  • lspci
  • sas2flash -list
  • sas3flash -list
  • sudo zpool status -v
  • sudo zpool import
  1. Whatever you do, do NOT run any commands that will make any changes the the disks in question as this could make things worse and reduce the chances of recovering your data.

Thanks.

P.S. If you are new to TrueNAS, roughly speaking when did you install it and why did you install 23.10?

NAME     MODEL                   PTTYPE TYPE     START          SIZE PARTTYPENAME         PARTUUID
sda      TOSHIBA MQ01ABD100      gpt    disk           1000204886016
├─sda1                           gpt    part       128    2147483648 FreeBSD swap         b22cef3e-ce76-11ee-9fb8-d45d64208664
└─sda2                           gpt    part   4194432  998057316352 FreeBSD ZFS          b2639d11-ce76-11ee-9fb8-d45d64208664
sdb      WDC WD5000AAKX-22ERMA0  gpt    disk            500107862016
├─sdb1                           gpt    part       128    2147483648 FreeBSD swap         b226e7fc-ce76-11ee-9fb8-d45d64208664
└─sdb2                           gpt    part   4194432  497960292352 FreeBSD ZFS          b25df2bc-ce76-11ee-9fb8-d45d64208664
sdc      EMTEC X250 256GB        gpt    disk            256060514304
├─sdc1                           gpt    part        40     272629760 EFI System           3bcb9985-ce6f-11ee-b719-d45d64208664
├─sdc2                           gpt    part  34086952  238605565952 FreeBSD ZFS          3bd3eff9-ce6f-11ee-b719-d45d64208664
└─sdc3                           gpt    part    532520   17179869184 FreeBSD swap         3bd0a529-ce6f-11ee-b719-d45d64208664
  └─sdc3                                crypt            17179869184
sdd      Hitachi HDS721050CLA362 gpt    disk            500107862016
├─sdd1                           gpt    part       128    2147483648 FreeBSD swap         b2112368-ce76-11ee-9fb8-d45d64208664
└─sdd2                           gpt    part   4194432  497960292352 FreeBSD ZFS          b24ce596-ce76-11ee-9fb8-d45d64208664
sde      Portable SSD            gpt    disk           1000204886016
└─sde1                           gpt    part      2048 1000202788864 Microsoft basic data 92d20b67-e82f-4944-abca-2da5e20fd9f3```

```root@truenas:/# lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 0d)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 0d)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode]
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0)
00:1c.7 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #8 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #11 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a2ca
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP108 High Definition Audio Controller (rev a1)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)```

```root@truenas:/# sas2flash -list
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

        No LSI SAS adapters found! Limited Command Set Available!
        ERROR: Command Not allowed without an adapter!
        ERROR: Couldn't Create Command -list
        Exiting Program.```

```root@truenas:/# sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

        No Avago SAS adapters found! Limited Command Set Available!
        ERROR: Command Not allowed without an adapter!
        ERROR: Couldn't Create Command -list
        Exiting Program.```

```root@truenas:/# sudo zpool status -v
  pool: boot-pool
 state: ONLINE
status: One or more features are enabled on the pool despite not being
        requested by the 'compatibility' property.
action: Consider setting 'compatibility' to an appropriate value, or
        adding needed features to the relevant file in
        /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d.
  scan: scrub repaired 0B in 00:00:26 with 0 errors on Fri Nov  8 03:45:27 2024
config:

        NAME                                       STATE     READ WRITE CKSUM
        boot-pool                                  ONLINE       0     0     0
          ata-EMTEC_X250_256GB_A2205CW03290-part2  ONLINE       0     0     0

errors: No known data errors```

```root@truenas:/# sudo zpool import
   pool: data
     id: 15103091714514370022
  state: FAULTED
status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72
 config:

        data                                    FAULTED  corrupted data
          b25df2bc-ce76-11ee-9fb8-d45d64208664  ONLINE
          b24ce596-ce76-11ee-9fb8-d45d64208664  ONLINE```

I don't know how serious it is

Bare metal installation?
As reported, the pool ‘data’ is a mirror stripe :scream: (RAID0 equivalent), and you can try to import it with
sudo zpool import -f data
But the condition is potentially serious.

Edit Correction prompted by @Protopia.

  1. IMO you should not yet try to import it with the -f flag - we need to be a little more thoughtful about recovery before doing something that might make the data corruption worse.

  2. The pool data appears to me to be a stripe and NOT a mirror or RAID - and if this is the case you do not have ANY redundancy for your data and any error on either of the drives loses you the data on both of them.

  3. The pool data consists of only 2 drives and not 3 as you told us. The UUIDs in the zpool import point us to the WDC WD5000AAKX 500GB drive and the Hitachi HDS721050CLA362 500GB drive. The TOSHIBA MQ01ABD100 1TB drive does not appear to be in use.

  4. Can you please run the cli -c "system version" command that I added earlier and report the results.

  5. Can you please run sudo zpool import -F -n data which tells us what will happen if we try to run a recovery mode import without actually doing it. NOTE: this has a capital -F whereas @etorix’s was with a small -f and these will attempt very different types of imports.

  6. Do you have any idea how much data you had on this pool? If we mount the pool read-only, do you have anywhere to copy your data to?

My thoughts:

  1. IMO you have a non-redundant striped 1TB pool. We should probably try to get you to a situation where you have a redundant pool instead as you have enough drives to achieve a 1TB redundant pool.

  2. Dependent on what we can do to get this pool mounted, there are various routes to getting to a redundant pool, but it will depend on whether you have disk space elsewhere that you can copy the data to. But worse case we seem to have a spare 1TB drive we can use temporarily to copy the data to.

  3. IMO we should first try to mount it read-only and take a backup copy of the data. Then we can try to do a recovery mode import of the pool which will try to roll back the ZFS transactions to an earlier point in time where the pool wasn’t corrupt.

  4. TrueNAS is intended to make pools from similarly sized drives, and not drives of different sizes. Assuming we move the data off somewhere else, we seem to have a choice of using the 2x500GB drives and the 1x 1TB drive either to create a standard 3x 500GB RAIDZ1, or try to create 2x 500GB mirrors, using the 1TB drive split in two as a mirror for each of the 500GB drives (which is not really supported) - either way we end up with giving 1TB of useable space. (If we can’t move the data off elsewhere, we can use the 1TB drive as a temporary staging post, create a degraded RAIDZ1 on the 2x 500GB drives and once we have moved the data back, we can then add and resilver the 1TB drive as a 3rd drive to bring the pool back to being non-degraded and redundant.

P.S. As far as I can tell, all drives are CMR and not SMR which is a bonus.

P.P.S.

In my view the condition is definitely serious.

P.P.P.S @Cusssy

To get the formatting nice you want to paste the output as follows:

```
(pasted text for one command)
```

You don’t have to edit your previous post to fix this, but if you did it would make reading it a little easier for others who follow.

I don’t understand, I remember creating the pool and selecting the raid5 option in the “layout” dropdown, I don’t understand anything, I’m very sorry, but I’m sure it used the 3 disks, it had a 1.69TB disk in total.

I connected a 1TB ssd that I had at home because I read that a guy recovered the data by connecting an extra disk, but it doesn’t seem to be my case.

What you say about mounting the pool in read-only and copying the data to other disks I think is the best option right now, I can make space on my computer’s disks and copy everything important there

root@truenas:/# cli -c "system version"
TrueNAS-SCALE-23.10.2

and the command sudo zpool import -F -n data simply doesn’t return anything.

Ok. If you think you used 2x 500GB and 1x 1TB to make a pool which was > 1TB in size, then it had to have been a stripe. Indeed 1.69TiB (i.e. TiB = 2^40 bytes) is approximately 2TB (TB = 10^12). So this would be a stripe across all three drives.

(Note: You could not have selected RAID5 in a dropdown because that is not an option in TrueNAS.)

So now we have a much more serious problem i.e. that the pool is broken because one of the stripes is somehow no longer part of the pool.

Just adding the Portable SSD would NOT have caused this - regardless of what you read, adding a new 1TB drive to the system would do nothing by itself - you would have to tell ZFS to replace one drive with another using the UI or a shell command.

Unfortunately I have absolutely no idea how a drive might spontaneously become detached from a pool, nor how to reattach it once it has happened. Whilst we can still try to mount the pool with increasingly desperate and unlikely commands, I fear that the pool is irretrievably lost and (unless you pay a data recovery company a huge amount of money) your data is gone forever.

At this point I have reached the limit of my ZFS knowledge - if there is anyone more expert who might know how to recover the detached drive and recover pool to the point that you can even read the data please chip in now.

If you decide that your data has indeed gone, and you want to create a new pool from the existing disks, then do come back for more advice on how to achieve this, (and more importantly how to ensure that your data is safeguarded in the future).

But the data is still there, it’s just that you can’t access it, right?

In the list of disks, it appears that the two 500GB disks are part of the data pool, but the 1TB HDD is not recognized as part of the pool.

We’ll need @HoneyBadger here… If it were a stripe of 3 drives, ZFS woudl report a missing drive.
Please provide the output of
sudo zdb -l /dev/sda1
sudo zdb -l /dev/sdb1
sudo zdb -l /dev/sdd1
so we can see what’s in there.

The data does indeed look like the data is still there, and a very expensive data recovery firm can probably retrieve it for you, but if you want to avoid that and get the data yourself then as far as my own knowledge goes there is no way to reattach the disk to the pool and then import it, nor to import a pool which now only has 2 of the 3 drives attached.

I am doing a little research through, and here are a few more commands to try (but I have no idea whether I have these right or not):

  • sudo zpool history data | tail -n 50
  • sudo zdb -e data
  • sudo zdb -eC data
  • sudo zdb -eh -AAA data | tail -n 50

As far as I can tell, these are readonly and will not make things worse.

It doesn’t look good

root@truenas:/# sudo zdb -l /dev/sda1
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@truenas:/# sudo zdb -l /dev/sdb1
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@truenas:/# sudo zdb -l /dev/sdd1
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3

My pool is fine, but sudo zdb -l /dev/sda gives exactly the same errors as you are getting.

So I don’t think that the output you got actually tells us anything.

im not sure if good or bad

root@truenas:/# sudo zpool history data | tail -n 50
cannot open 'data': no such pool
root@truenas:/# sudo zdb -e data

Configuration for import:
        vdev_children: 3
        version: 5000
        pool_guid: 15103091714514370022
        name: 'data'
        state: 0
        hostid: 859060325
        hostname: 'truenas'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 15103091714514370022
            children[0]:
                type: 'missing'
                id: 0
                guid: 0
            children[1]:
                type: 'disk'
                id: 1
                guid: 7372822762195821153
                metaslab_array: 264
                metaslab_shift: 32
                ashift: 12
                asize: 497955373056
                is_log: 0
                DTL: 29174
                create_txg: 4
                path: '/dev/disk/by-partuuid/b25df2bc-ce76-11ee-9fb8-d45d64208664'
            children[2]:
                type: 'disk'
                id: 2
                guid: 5094482715523507685
                metaslab_array: 256
                metaslab_shift: 32
                ashift: 12
                asize: 497955373056
                is_log: 0
                DTL: 29175
                create_txg: 4
                path: '/dev/disk/by-partuuid/b24ce596-ce76-11ee-9fb8-d45d64208664'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2
zdb: can't open 'data': Input/output error

ZFS_DBGMSG(zdb) START:
spa.c:6521:spa_import(): spa_import: importing data
spa_misc.c:418:spa_load_note(): spa_load(data, config trusted): LOADING
vdev.c:161:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/b25df2bc-ce76-11ee-9fb8-d45d64208664': best uberblock found for spa data. txg 3382013
spa_misc.c:418:spa_load_note(): spa_load(data, config untrusted): using uberblock with txg=3382013
spa_misc.c:404:spa_load_failed(): spa_load(data, config untrusted): FAILED: unable to open rootbp in dsl_pool_init [error=6]
spa_misc.c:418:spa_load_note(): spa_load(data, config untrusted): UNLOADING
ZFS_DBGMSG(zdb) END
root@truenas:/# sudo zdb -eC data
zdb: can't open 'data': Input/output error
root@truenas:/# sudo zdb -eh -AAA data | tail -n 50
zdb: can't open 'data': Input/output error

Ok - so this tells us that there is a disk missing - which we already knew, but what it suggests to me is that ZFS actually knows that there is a drive missing and that it is not just a 2-drive pool, and so if we can recreate the data for the missing hard drive here - no idea how - then the pool might come back to life.

P.S. It would sure be helpful if the zpool import gave us something like this when ZFS knows that a device is missing:

        data                                    FAULTED  corrupted data
          b2639d11-ce76-11ee-9fb8-d45d64208664  MISSING
          b25df2bc-ce76-11ee-9fb8-d45d64208664  ONLINE
          b24ce596-ce76-11ee-9fb8-d45d64208664  ONLINE

Sorry, I copied the commands from a diffrent layout. Yours would be
sudo zdb -l /dev/sda2
sudo zdb -l /dev/sdb2
sudo zdb -l /dev/sdd2

The question is why the 1 TB drive sda is physically present but not identified by ZFS.

Now its working, good or bad?

root@truenas:/# sudo zdb -l /dev/sda2
failed to read label 0
failed to read label 1
------------------------------------
LABEL 2
------------------------------------
    version: 5000
    name: 'data'
    state: 0
    txg: 3303346
    pool_guid: 15103091714514370022
    errata: 0
    hostid: 859060325
    hostname: 'truenas'
    top_guid: 12879759490428939429
    guid: 12879759490428939429
    vdev_children: 3
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 12879759490428939429
        path: '/dev/disk/by-partuuid/b2639d11-ce76-11ee-9fb8-d45d64208664'
        phys_path: 'id1,enc@n3061686369656d30/type@0/slot@1/elmdesc@Slot_00/p2'
        metaslab_array: 270
        metaslab_shift: 33
        ashift: 12
        asize: 998052462592
        is_log: 0
        DTL: 29176
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 2 3
root@truenas:/# sudo zdb -l /dev/sdb2
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'data'
    state: 0
    txg: 3303346
    pool_guid: 15103091714514370022
    errata: 0
    hostid: 859060325
    hostname: 'truenas'
    top_guid: 7372822762195821153
    guid: 7372822762195821153
    vdev_children: 3
    vdev_tree:
        type: 'disk'
        id: 1
        guid: 7372822762195821153
        path: '/dev/disk/by-partuuid/b25df2bc-ce76-11ee-9fb8-d45d64208664'
        phys_path: 'id1,enc@n3061686369656d30/type@0/slot@3/elmdesc@Slot_02/p2'
        metaslab_array: 264
        metaslab_shift: 32
        ashift: 12
        asize: 497955373056
        is_log: 0
        DTL: 29174
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
root@truenas:/# sudo zdb -l /dev/sdd2
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'data'
    state: 0
    txg: 3303346
    pool_guid: 15103091714514370022
    errata: 0
    hostid: 859060325
    hostname: 'truenas'
    top_guid: 5094482715523507685
    guid: 5094482715523507685
    vdev_children: 3
    vdev_tree:
        type: 'disk'
        id: 2
        guid: 5094482715523507685
        path: '/dev/disk/by-partuuid/b24ce596-ce76-11ee-9fb8-d45d64208664'
        phys_path: 'id1,enc@n3061686369656d30/type@0/slot@4/elmdesc@Slot_03/p2'
        metaslab_array: 256
        metaslab_shift: 32
        ashift: 12
        asize: 497955373056
        is_log: 0
        DTL: 29175
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3

What this last output suggests is that the data is still there.

I found a reddit post that might help and this suggests that you try:

  • sudo zpool import -d /dev/disk/by-id/ -Fn data
  • sudo zpool import -d /dev/disk/by-id/ data

then do sudo zpool status to see if it is mounted.

Please copy and paste the output as before.

the /dev/disk/by-id/ i have to change it to for exemple /dev/disk/by-partuuid/b25df2bc-ce76-11ee-9fb8-d45d64208664? sorry for the silly question but i dont wanna execute something and break everything :pray:

You are right to be cautious - let’s try the following first (which should only tell us what can be imported and shouldn’t try to do an actual import):

  • sudo zpool import -d /dev/disk/by-id/
  • sudo zpool import -d /dev/disk/by-partuuid/

Let’s see what both of these say and then we can decide which we can use to try an actual import.