Pools status is (unavilable) I can't access the pools data

HI, We have TrueNAS (12) with 4 pools in our storage. Two of these pools (TPM_STORAGE_01, TPM_STORAGE_02) have RAIDz2 for data drives and Mirror drives for cache and logs. The system was operating without issues in these pools, but after a routine system restart, the status of the 2 pools using log drives became unknown. The other two pools continued to function normally. We attempted to import the pools that showed an unknown status due to the log drive issue, but were unsuccessful. Trying to import each pool separately also failed, consistently resulting in an error stating the pool was not available. Additionally, we were unable to execute any tasks at the pool level due to the same error. Smart checks on the 4 drives (2 for each affected pool) also failed with errors, preventing us from diagnosing further. We suspect that all 4 log drives may have been corrupted simultaneously, though we’re unsure what caused this. We urgently need assistance to restore functionality to the pools without losing any data. Below are the results of the import attempts. The ‘zpool status’ command shows only the normal pools, not the ones marked as unknown.

the configuration database and will be reset on reboot.

root@SRV3:~ # zpool import
pool: TPM_STORAGE_02
id: 9038427024247942209
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: Message ID: ZFS-8000-6X — OpenZFS documentation
config:

    TPM_STORAGE_02                                  UNAVAIL  missing device
      raidz2-0                                      ONLINE
        gptid/459e3901-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/46816177-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/47599791-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/4839f3eb-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/4913cb3f-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/4cb6e973-60bf-11e8-afbd-0cc47a1e183a  ONLINE
      raidz2-1                                      ONLINE
        gptid/466c7202-b8ce-11ed-9cd5-0cc47a1e183a  ONLINE
        gptid/4ee15f58-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/4fc29728-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/50a1a8ce-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/5188ef3a-60bf-11e8-afbd-0cc47a1e183a  ONLINE
        gptid/526ebd59-60bf-11e8-afbd-0cc47a1e183a  ONLINE
      raidz2-3                                      ONLINE
        gptid/3c0ab4d8-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/3ce59879-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/3dbdf5d1-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/3e9b5caf-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/3f7351c2-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/40513800-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
      raidz2-4                                      ONLINE
        gptid/529ceb90-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/537df4e5-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/545609eb-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/553c5094-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/563258fb-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
        gptid/5724479d-01ea-11e9-a2e3-0cc47a1e183a  ONLINE
      raidz2-5                                      ONLINE
        da64p2                                      ONLINE
        da63p2                                      ONLINE
        da61p2                                      ONLINE
        da62p2                                      ONLINE
        da55p2                                      ONLINE
        da56p2                                      ONLINE
      raidz2-6                                      ONLINE
        gptid/36673d6c-de51-11eb-bee9-0cc47a1e183a  ONLINE
        da77p2                                      ONLINE
        gptid/37ee0a14-de51-11eb-bee9-0cc47a1e183a  ONLINE
        gptid/3905a0ef-de51-11eb-bee9-0cc47a1e183a  ONLINE
        gptid/3923492c-de51-11eb-bee9-0cc47a1e183a  ONLINE
        gptid/393a13c7-de51-11eb-bee9-0cc47a1e183a  ONLINE
    logs
      mirror-2                                      UNAVAIL  insufficient replicas
        gptid/542cb5f8-60bf-11e8-afbd-0cc47a1e183a  UNAVAIL  cannot open
        gptid/54a1308d-60bf-11e8-afbd-0cc47a1e183a  UNAVAIL  cannot open

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

pool: TPM_STORAGE_01
id: 11666779640591141191
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: Message ID: ZFS-8000-6X — OpenZFS documentation
config:

    TPM_STORAGE_01                                  UNAVAIL  missing device
      raidz2-1                                      ONLINE
        gptid/de38dbe6-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/df41449e-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e044613e-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e169721f-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e274d902-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e36e4f1d-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e47aab42-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e579457d-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e67ec5c2-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e783d559-9e58-11e5-a7af-0cc47a1e183a  ONLINE
      raidz2-2                                      ONLINE
        gptid/e8ad867f-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/e9b8f9ae-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/eab65a7a-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/ebcc997d-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/ecd6fc0f-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/ede678dd-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/eee1ed71-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/efdc04a8-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/f0d92f56-9e58-11e5-a7af-0cc47a1e183a  ONLINE
        gptid/f1e707a6-9e58-11e5-a7af-0cc47a1e183a  ONLINE
    logs
      mirror-0                                      UNAVAIL  insufficient replicas
        gptid/dd0fd3d6-9e58-11e5-a7af-0cc47a1e183a  UNAVAIL  cannot open
       gptid/f1e707a6-9e58-11e5-a7af-0cc47a1e183a   UNAVAIL  cannot open

this is the result of Smart ( all 4 drive same error) da0 smartctl Long (extended) offline self test failed [medium or hardware error (serious)]

Welcome to the TrueNAS forums!

The manual page for zpool import says if you use the -m option, you can import a pool with missing Log devices.

Naturally if their was pending synchronous writes from NFS or iSCSI, data could be lost…

thanks @Arwen for your response, I saw this option but I was careful not to execute anything before making sure that will not affect the data, when you said data could be lost if their was pending synchronous writes from NFS or iSCSI , do you mean the whole data on the pools or the data which was pending ?
how do you think I can restore the logs drives after that incase everything went good? since they their status is unavilable (cannot open) as you can see above

Just the pending data. All the normal pool data appears perfectly fine. You can run a scrub after importing to see.

If you import the pool WITHOUT the attached log devices, if you were later able to “fix” those log devices, they would be useless. ZFS is a transaction based file system. Whence you import a pool R/W, it moves on with life. Any old log devices become worthless.

I have no clue on how to recover your log devices.

One thing that bites less experienced ZFS users in regards to log devices, is that a ZFS SLOG / log device is mostly write only. And can wear out consumer SSDs faster than expected. If that happens on-line, no problem. ZFS simply transfers from a SLOG, (Separate Intent Log device), to the pool internal log device, (aka ZIL, ZFS Intent Log).

Further, it is highly suggested that any ZFS SLOG device have PLP, Power Loss Protection. That limits what SSD and NVMe devices are suitable. This is to protect the synchronous data in the event of a power loss for the server, (like a power supply failure).

1 Like

Real question is where are your log devices?

What were they? How were they connected? Why are they all missing.

Maybe something else failed.

And if you can restore them, then you can recover any in-flight data.

But only if you haven’t already imported the pool.

Which is why the pool didn’t auto-import.

Question: Since the system was an intentional commanded restart, wouldn’t the logs have committed the data before hand?

@palmu The “restart”, was it a reboot or shutdown, power off, power on, failure?

And as @Stux asked, I too am curious what the drives are (make/model), how are they connected, all those details can be very important. Also, do you perform routine SMART tests on your drives?

I would expect that.

Thank you. I managed to bring the pools online again without the logs and cache drives. What’s strange is that all 8 SSD drives (cache and logs) across two different chassis became unavailable simultaneously after the intentional reboot we performed, all smart checks on these desks is failed and im trying to know what could be the reason for this.

Like others have said would be good to know the make and model of the drives. High write endurance is a requirement for SLOGs. Also were the drives Over-Provisioned? The idea that 4 SSDs that were purchased at the same time that have done the same job in the same systems for X numbers of years would die at the exact same time seems pretty plausible to me.

The drives are TOSHIBA PX02SMF020, 200 GB SFF SAS 12 Gbps, ( 8 drives), THE MTBF for those drives supposed to be 2,000,000 hours while they were working for less than 50,000 hours, the storage Hardware is Supermicro Supermicro 846-9 6047R-E1R24L with JBOD SSG-6038R-E1CR16L

‘Endurance TBW: 3.7PB (200GB)’

Could you have exceeded 3.7PB during their life?

Did you Over Provision them?

I don’t think so ! how we can check this

SMART but it sounds like the drives are no longer being detected to check this? Do you have any historic SMART reports you could look at?

unfortunately no

Hard to be sure then. How old are the drives? How big are your pools? What are they used for? Trying to build up a picture of their potential endurance over their life.

I think they are 7 years old, the first pool is 27 TB and the second one is 100 TB, they used to store media files, documents and VMs, no high load

What protocol are you using NFS, SMB?

both

Do you have sync enabled on all your datasets or do you leave it as default and allow the client to decide?

Standard setting is used on all datasets