Help ! in my truenas scale 24.0.1 a disk died in my zpool, I added two new disks to my config but ALL disks are offline

Hello folks :slight_smile:
as said one of my 16*10TB disks died and another was on the verge of doing so too, I had to put the system off for a while.
so as soon as could I added two new disks to the system, and rebooted, but…

now the machine says that all disks are offline, with the zpool ‘exported’ and the new disk as not availlable (normal as it are not formated yet).

what’s bug me is that if I want to import the zpool as it was (named myzpool) I have to choose from ‘new pool’ on ‘existing pool’… it sees that it belong to ‘myzpool’ but when I need to choose that existing zpool there is nothing in the chooser !

How can I re-import my pool and repair it with the new disks without losing all my files ?

can I really import the disks to anew pool without losing anything ?
and, why can’t I just re-import it as it knows to wich zpool it belongs ?

Thanks for a speedy answer please as it’s really urgent.

What does the follow command say?
zpool import

One of the problems with ZFS is that it does not like to import “damaged” pools. Now their could be a completely reasonable reason for that, like a bad disk that has not yet been replaced. It would would be nice if the GUI had that ability. But, the intent is to keep the NAS running… which is different for SOHO users than Enterprise users.

As with most UIs, the TrueNAS one only deals with the most common use cases, and there are just so many ways that pools can go wrong that the UI cannot hope to deal with all of them. And then you have to rely on the command line, and it is then easy to try the wrong thing and lose your pool completely.

I appreciate that you think doing something to get it back online is urgent, but please trust me when I say that talking it slower and getting your data back online again will be more important to you than rushing in and making a mistake and losing your data completely.

So we need some hard facts before we can help you and as @Arwen says, the output of sudo zpool import is the most important.

But also please run the following additional commands which will also help us:

  • lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
  • lspci

If you can copy & paste the output of each command separately inside a </> box, that would be helpful.

P.S. Can you please describe your pool setup - is it a mirror or RAIDZ and if RAIDZ is it RAIDZ1 or RAIDZ2?

Also please post your complete hardware setup. Most importantly how the disks are connected to the motherboard

Hmm … OP asks for “a speedy answer please as it’s really urgent” and he gets 3 speedy answers within a couple of hours, however it doesn’t really appear to be “really urgent” because here we are another 11 hours later and we still haven’t had the information requested 12 hours ago.

hello there :slight_smile:
sorry for late reply, I’m somewhat invalid and one of my problem is I fall asleep at any hour of the day :frowning:

Btw, looking at the output of your askings I found that one of the degraded disk was not seen by truenas scale, hence why it was not loading the zpool.
I’ve changed the cable for a new one and now it sees it, (dam’n 3.3v stuff on new disks…) so it is resilvering and replacing the disks now. loooong wait :wink:

I really thanks you all for replying so fast, I know that I can count of you when some problem occurs with truenas :slight_smile:

Btw II :

My config is as is :slight_smile:

Nas[~]$ sudo lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 7 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Z68 Express Chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
02:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
03:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
04:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
05:00.0 SATA controller: JMicron Technology Corp. JMB362 SATA Controller (rev 10)

with 17 (soon 18) 10 TB sata disks, 1 3 TB disk for system pool and 1 500 GB SSD WD Red for cache, on SAS2008 card with 24x ports extender card.
running on a 10 GB FTTH line.
my pool is RaidZ2 with multiple datasets and smb/nfs shares.

all in a neat Fractal Design Define XL R2 box :slight_smile: (heavy! but very surdy)
I’ll replace this one by a Define XL R7 soon (more room and better air flow.

Thanks again, and sorry for the noise.

Regards,
Jeff

Please run the following commands and post the results:

  • sudo sas2flash -list
  • sudo sas3flash -list

Hello :slight_smile:

here it is :

Nas[~]$ sudo sas2flash -list
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18) 
Copyright (c) 2008-2014 LSI Corporation. All rights reserved 

        Adapter Selected is a LSI SAS: SAS2008(B2)   

        Controller Number              : 0
        Controller                     : SAS2008(B2)   
        PCI Address                    : 00:01:00:00
        SAS Address                    : 500605b-0-05f6-f320
        NVDATA Version (Default)       : 14.01.00.08
        NVDATA Version (Persistent)    : 14.01.00.08
        Firmware Product ID            : 0x2213 (IT)
        Firmware Version               : 20.00.07.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9211-8i
        BIOS Version                   : 07.39.02.00
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : SAS9211-8i
        Board Assembly                 : N/A
        Board Tracer Number            : N/A

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.
Nas[~]$ sudo sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02) 
Copyright 2008-2017 Avago Technologies. All rights reserved.

        No Avago SAS adapters found! Limited Command Set Available!
        ERROR: Command Not allowed without an adapter!
        ERROR: Couldn't Create Command -list
        Exiting Program.

and for completeness :slight_smile:

Nas[~]$ sudo zpool status
  pool: elZ2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Feb 10 14:46:55 2025
        4.12T / 117T scanned at 2.20G/s, 2.90T / 117T issued at 1.55G/s
        416G resilvered, 2.47% done, 21:04:02 to go
expand: expanded raidz2-0 copied 111T in 4 days 10:00:04, on Mon Jan  6 05:44:54 2025
config:

        NAME                                        STATE     READ WRITE CKSUM
        elZ2                                          DEGRADED     0     0     0
          raidz2-0                                  DEGRADED     0     0     0
            aab036a6-7e2b-4632-b16d-cd7866845e5e    ONLINE       0     0     0
            7a8f8da1-836b-4a4c-9eaa-2aa29da2beb7    ONLINE       0     0     0
            a10d9ac9-ef2c-4ede-9e6b-eba01e901361    ONLINE       0     0     0
            9f2dce88-cde7-4923-b0bb-bf4de890ea52    ONLINE       0     0     0
            b74d4b85-0436-4814-8515-b405d7e40e24    ONLINE       0     0     0
            replacing-5                             DEGRADED     0     0     0
              24e3a0a0-411b-4174-85e4-83eb48744f9f  ONLINE       0     0    31  (resilvering)
              96d445ec-e8b2-42ce-8f1a-63ca963eead9  REMOVED      0     0     0
            replacing-6                             DEGRADED     0     0     0
              aa40633b-42bc-41f7-b8c2-555530227841  DEGRADED     0     0    30  too many errors
              b4dbc538-a334-417c-9b6a-7265693bc710  ONLINE       0     0     0  (resilvering)
            fd7f4ef7-237a-48c0-8a72-bebaf1cd3fb5    ONLINE       0     0     0
            83e62634-a0a5-4c2f-abb8-10b763833042    ONLINE       0     0     0
            ea3b911a-d478-4341-b4cd-4549385fcdde    ONLINE       0     0     1
            52c0cc19-1576-4186-b41f-8ee005e7ebaf    ONLINE       0     0     0
            ee9542ae-6266-4615-bcaa-6a0f06dafb87    ONLINE       0     0     0
            0b764a40-fd5d-4631-922d-625178717347    ONLINE       0     0     0
            e0df314c-cc1e-460e-9591-842264306d5b    ONLINE       0     0     0
            af112e1e-211c-4061-a0bf-4e7f943903ca    ONLINE       0     0     0
            c2c7b568-7eff-4f85-998b-eb4e1ac51897    ONLINE       0     0     0
        cache
          325bad3f-5323-47be-9736-e06e1f30603b      ONLINE       0     0     0

errors: 1636 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:04:39 with 0 errors on Mon Feb 10 03:49:42 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdt3      ONLINE       0     0     0

errors: No known data errors

Cheers :slight_smile:
Jeff

As soon as the resilvering has completed you need to run a scrub to try to clean up the 1636 data (checksum) errors - and once that has run you should do a sudo zpool status -v elZ2 to see what errors remain.

The sas2flash shows you are running IT firmware which is correct - I haven’t checked whether this is the latest firmware for this device.

As an aside I note that this is a 16-wide RAIDZ2 vDev and the recommended maximum width for new vDevs is 12. The recommendation is only based on resilvering times, and I note that you are looking at c. 1 day for your existing resilvers (which is not too bad) so I am not suggesting that you need to do anything about this existing vDev, just bear it in mind for the future.

2 Likes

Thanks :slight_smile: wasn’t aware of that size limit for vdevs…
should see this for the next one.

btw, so with a petabyte(s) nas/file server I should need a lotsa vdevs of 12 disks each ! :wink:

It’s not a limit so much as a recommendation. There are AFAIK 3 reasons for limiting vDev width:

  1. Resilvering times
  2. IOPS
  3. Sub-optimal parity / Write amplification

good to know :slight_smile:
in that case, what will be the accurate number of dedicated metadata disks for such a configuration you would suggest ? would add this help ?

same question for cache disk does my 500 gb ssd suffice ?

would a dedup disk be necessary too ?

sorry for so much questions, ^^)

You do probably not need any of the special types of vDev (SLOG, L2ARC, Dedup, Special Allocation (Metadata)).

SLOG is for synchronous writes only - and you should only do synchronous writes for specific types of data that needs it. But of those writes are to HDD, then you definitely will need an SLOG.

Dedup is renowned for having a very bad impact on performance and needing a LOT of memory. Don’t do it, just don’t. If you need to dedup find or write a script which does dedup using block cloning.

L2ARC can help, but apparently only if you have >= 64GB of memory. You might be better off simply adding memory and using a script to scan the directories and read the metadata and cache it in memory.

A special allocation vDev (for metadata) can definitely help if you have a specific response time problem because your read activity is so random that the metadata isn’t in the cache. You need to try this, but bear in mind that for RAIDZ you cannot later remove this. If you are going to do this, bear in mind that the Metadata vDev is critical to the pool and if you lose it then you lose your pool - so it needs to be redundant and ideally at least as redundant as the data vDevs.

1 Like

Thanks you very much for these explanations :slight_smile:

We have been discussing this elsewhere, and it appears the old “rule” does not apply now, and probably was not as strict.

However, memory is always preferred.

Here is the discussion:

ok so as I have 32 gb ram (mainboard maxed here) the 500 GB l2arc ssd is good enough :slight_smile:

BTW, it surprise me that altough it seems it has finished replacing/resilvering one of the two faulty drives, what I see is that the new replacement drive is ‘offline’ while the faulty is still ‘online’ ??? should’nt it be replaced by the new one and be ‘offline’ instead ?

Nas[~]$ sudo zpool status
  pool: elZ2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Feb 11 13:32:58 2025
        20.7T / 117T scanned at 1.78G/s, 19.5T / 117T issued at 1.68G/s
        145G resilvered, 16.61% done, 16:34:48 to go
expand: expanded raidz2-0 copied 111T in 4 days 10:00:04, on Mon Jan  6 05:44:54 2025
config:

        NAME                                        STATE     READ WRITE CKSUM
        elZ2                                          DEGRADED     0     0     0
          raidz2-0                                  DEGRADED     0     0     0
            aab036a6-7e2b-4632-b16d-cd7866845e5e    ONLINE       0     0     0
            7a8f8da1-836b-4a4c-9eaa-2aa29da2beb7    ONLINE       0     0     0
            a10d9ac9-ef2c-4ede-9e6b-eba01e901361    ONLINE       0     0     0
            9f2dce88-cde7-4923-b0bb-bf4de890ea52    ONLINE       0     0     0
            b74d4b85-0436-4814-8515-b405d7e40e24    ONLINE       0     0     0
            replacing-5                             DEGRADED     0     0     0
              24e3a0a0-411b-4174-85e4-83eb48744f9f  ONLINE       0     0    31
              96d445ec-e8b2-42ce-8f1a-63ca963eead9  REMOVED      0     0     0
            replacing-6                             DEGRADED     0     0     0
              aa40633b-42bc-41f7-b8c2-555530227841  DEGRADED     0     0    30  too many errors
              b4dbc538-a334-417c-9b6a-7265693bc710  REMOVED      0     0     0
            fd7f4ef7-237a-48c0-8a72-bebaf1cd3fb5    ONLINE       0     0     0
            83e62634-a0a5-4c2f-abb8-10b763833042    ONLINE       0     0     0
            ea3b911a-d478-4341-b4cd-4549385fcdde    ONLINE       0     0     0  (resilvering)
            52c0cc19-1576-4186-b41f-8ee005e7ebaf    ONLINE       0     0     0
            ee9542ae-6266-4615-bcaa-6a0f06dafb87    ONLINE       0     0     0
            0b764a40-fd5d-4631-922d-625178717347    ONLINE       0     0     0
            e0df314c-cc1e-460e-9591-842264306d5b    ONLINE       0     0     0
            af112e1e-211c-4061-a0bf-4e7f943903ca    ONLINE       0     0     0
            c2c7b568-7eff-4f85-998b-eb4e1ac51897    ONLINE       0     0     0
        cache
          325bad3f-5323-47be-9736-e06e1f30603b      ONLINE       0     0     0

errors: No known data errors