Cannot import 'pool_a': insufficient replicas = scan: resilvered 156M

Protopia · October 11, 2024, 3:32pm

I believe that the import may have failed because a directory already existed at the mount point.

But it is weird that zpool status says the pool is online but the zfs mount says otherwise.

Protopia · October 11, 2024, 3:34pm

Yes - we don’t seem to have much to lose at this point. Let’s try it as sudo zpool import -o altroot=/mnt pool_a without the force flag and see what happens.

jerr · October 11, 2024, 3:43pm

so before next:

root@truenas[~]# sudo zpool status -v 

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:01:01 with 0 errors on Wed Oct  9 03:46:03 2024
config:

	NAME         STATE     READ WRITE CKSUM
	boot-pool    ONLINE       0     0     0
	  nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

  pool: pool_b
 state: ONLINE
config:

	NAME                                    STATE     READ WRITE CKSUM
	pool_b                                  ONLINE       0     0     0
	  c3c95aae-6a89-4494-b079-fc12934f725b  ONLINE       0     0     0

errors: No known data errors

then nothing changed:

root@truenas[~]# sudo zpool import -o altroot=/mnt pool_a

cannot import 'pool_a': insufficient replicas
	Destroy and re-create the pool from
	a backup source.


root@truenas[~]# ls -la /mnt                             
total 10
drwxr-xr-x  4 root root  4 Oct 11 15:09 .
drwxr-xr-x 21 root root 29 Oct  4 03:50 ..
drwxr-xr-x  2 root root  2 Oct  6 11:46 .ix-apps
drwxr-xr-x  2 root root  2 Oct 10 09:51 pool_b

Protopia · October 11, 2024, 3:56pm

Can you try the readonly import command you used last time but add the altroot bit to it?

sudo zpool import -o readonly=on,altroot=/mnt pool_a ??

(I am running out of ideas.) Time to admit defeat soon unless someone more expert that I can help you.

jerr · October 11, 2024, 4:06pm

root@truenas[~]# sudo zpool import -o readonly=on -o altroot=/mnt pool_a

root@truenas[~]# ls -la /mnt                                            
total 43
drwxr-xr-x  5 root root  5 Oct 11 18:01 .
drwxr-xr-x 21 root root 29 Oct  4 03:50 ..
drwxr-xr-x  2 root root  2 Oct  6 11:46 .ix-apps
drwxrwxrwx  8 ja   root  8 Oct  6 12:41 pool_a
drwxr-xr-x  2 root root  2 Oct 10 09:51 pool_b

root@truenas[~]# ls -la /mnt/pool_a
total 78
drwxrwxrwx  8 ja      root     8 Oct  6 12:41 .
drwxr-xr-x  5 root    root     5 Oct 11 18:01 ..
drwxr-xr-x 11 ja      root    12 Oct  6 11:18 STORAGE
drwxrwxrwx  8 ja      root     9 Nov 21  2022 VM
drwxrwx---  4 root    root     4 Oct  7 10:27 appData
drwxr-xr-x  2 root    root     8 Oct  6 00:08 covert
drwxr-xr-x  8 root    root    11 May 19  2023 ix-applications
drwxrwxrwx  5 reolink reolink  6 Oct  6 22:20 reolink

As for me you are a REAL MASTER!!

it looks like everything is :-))))

and now I have a question - what next?
what happened?
will this state persist after restart? or maybe copy what can be done now?
run SCRUB?

Protopia · October 11, 2024, 4:10pm

I suspect that the standard mount won’t work - it probably won’t mount readonly as standard.

My advice:

Copy as much of your irreplaceable data onto pool_b as you can.
Run a scrub to see whether you can clean up the pool (doubtful) and recover erroring files (doubtful).
If you have snapshots (which work at the block level and so snapshot the metadata as well as the files), we could try to roll-back to a snapshot.
If you can’t get full integrity in pool_a, you will need to destroy and recreate.

Protopia · October 11, 2024, 4:18pm

We should try to ensure you don’t get a repeat.

Please post details of your hardware, in particular your exact disk models and whether you use motherboard SATA ports or have an HBA, and if an HBA what model that is and whether it is in IT mode. Also confirm that your SMART details for all your drives are clean.

jerr · October 11, 2024, 4:26pm

Ok, I’m starting to work, so for now it’s not known what caused it?
Should I put the hardware data here or somewhere else?

etorix · October 11, 2024, 4:34pm

Here, so we can see if there’s a suspect and vet the configuration.

jerr · October 11, 2024, 4:36pm

Ok, if there are no standard commands listing the necessary information, I will compile exactly what you asked for

Protopia · October 11, 2024, 4:56pm

As a starter:

lsblk -bo NAME,MODEL,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
lspci
sas2flash -list
sas3flash -list

jerr · October 12, 2024, 5:56am

Unfortunately, the trunas community has blocked me from making entries for 5 hours, but copy data to other pool is almost finished so I’m starting scrub.
After everything I will send confirmation about the status of the pool

Board Manufacturer: Supermicro
Board Product Name: X11SCZ-F
4xSATA 12TB
RAM = 3x32GB PATRIOT PSP432G2662H1 266Mhz CL19
pool_a = sdb, sdc, sdd
pool_b  = sda

smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device

lsblk -bo NAME,MODEL,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
NAME        MODEL                          PTTYPE TYPE    START           SIZE PARTTYPENAME             PARTUUID
sda         TOSHIBA MG07ACA12TE            gpt    disk          12000138625024                          
└─sda1                                     gpt    part     2048 12000136527872 Solaris /usr & Apple ZFS c3c95aae-6a89-4494-b079-fc12934f725b
sdb         TOSHIBA MG07ACA12TE            gpt    disk          12000138625024                          
├─sdb1                                     gpt    part      128     2147418624 Linux swap               648ef9cc-39c0-4563-83bd-825553c188d9
└─sdb2                                     gpt    part  4194432 11997991058944 Solaris /usr & Apple ZFS 2fc128e1-b472-42ea-8a56-715eb5305916
sdc         ST12000NE0008-2PK103           gpt    disk          12000138625024                          
├─sdc1                                     gpt    part      128     2147418624 Linux swap               09142a11-cd5b-42c3-b3eb-a66aea81c773
└─sdc2                                     gpt    part  4194432 11997991058944 Solaris /usr & Apple ZFS 39d99900-6aa3-4c05-862e-73f24438a182
sdd         ST12000NE0008-2PK103           gpt    disk          12000138625024                          
├─sdd1                                     gpt    part      128     2147418624 Linux swap               889e609b-9e1b-4f1f-b7c7-3836f53b1680
└─sdd2                                     gpt    part  4194432 11997991058944 Solaris /usr & Apple ZFS d370c168-7b3b-4c12-9421-af4dd222fc09
zd0                                               disk            107374198784                          
zd16                                       dos    disk            161061289984                          
zd32                                              disk            107374198784                          
zd48                                       dos    disk           1099511644160                          
zd64                                       dos    disk              2147500032                          
zd80                                              disk             11574951936                          
nvme0n1     SSDPEMKF256G8 NVMe INTEL 256GB gpt    disk            256060514304                          
├─nvme0n1p1                                gpt    part     4096        1048576 BIOS boot                d16d8765-eb51-491a-b5c5-df9286e187f3
├─nvme0n1p2                                gpt    part     6144      536870912 EFI System               9576796a-0147-4b87-9c5a-bfe5740577b0
├─nvme0n1p3                                gpt    part 34609152   238340611584 Solaris /usr & Apple ZFS 63c1f21e-b90b-47f5-9ca5-30e70ca1a197
└─nvme0n1p4                                gpt    part  1054720    17179869184 Linux swap               47bc055a-e609-4426-acf9-0573129a4aea
root@truenas[~]#

lspci
00:00.0 Host bridge: Intel Corporation 8th/9th Gen Core 8-core Desktop Processor Host Bridge/DRAM Registers [Coffee Lake S] (rev 0a)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #7 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation Cannon Point-LP LPC Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
03:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
05:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller
06:00.0 Non-Volatile memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03)

LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18) 
Copyright (c) 2008-2014 LSI Corporation. All rights reserved 

	No LSI SAS adapters found! Limited Command Set Available!
	ERROR: Command Not allowed without an adapter!
	ERROR: Couldn't Create Command -list
	Exiting Program.

sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02) 
Copyright 2008-2017 Avago Technologies. All rights reserved.

	No Avago SAS adapters found! Limited Command Set Available!
	ERROR: Command Not allowed without an adapter!
	ERROR: Couldn't Create Command -list
	Exiting Program.

checked all HDD and all Completed without error, do You need some special from SMART ? I ask because there is so many datas

jerr · October 12, 2024, 12:24pm

Ok, backup done so adventures continue
when I call SCRUB from the browser menu - it simply does not execute - the disks do not make noise, and the progress of the work is 0% after 7 hours

when I try to stop it from the terminal I get information that no SCRUB is being executed ( but of course in the browser it is still running):

root@truenas[~]# zpool scrub -s pool_a
cannot cancel scrubbing pool_a: there is no active scrub

When I run SCRUB from the terminal on pool_a I do not get any confirmation of the execution of the command, only the cursor blinks on a new line.
When I run SCRUB from the terminal on pool_b right after that, SCRUB executes, and below is the current status of both SCRUBs:

pool: pool_a
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 156M in 00:01:28 with 592 errors on Mon Oct  7 17:55:18 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool_a                                    ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            2fc128e1-b472-42ea-8a56-715eb5305916  ONLINE       0     0    3K
            39d99900-6aa3-4c05-862e-73f24438a182  ONLINE       0     0 3.01K
            d370c168-7b3b-4c12-9421-af4dd222fc09  ONLINE       0     0 3.01K

errors: 585 data errors, use '-v' for a list

  pool: pool_b
 state: ONLINE
  scan: scrub in progress since Sat Oct 12 14:00:09 2024
        3.01T / 4.05T scanned at 23.5G/s, 13.9G / 4.05T issued at 109M/s
        0B repaired, 0.33% done, 10:50:10 to go
config:

        NAME                                    STATE     READ WRITE CKSUM
        pool_b                                  ONLINE       0     0     0
          c3c95aae-6a89-4494-b079-fc12934f725b  ONLINE       0     0     0

errors: No known data errors

when I try to clean errors I get information that pool_a is in redonly mode

root@truenas[~]# zpool clear pool_a
cannot clear errors for pool_a: pool is read-only

Of course I do not know anything about this but it looks like SCRUB for pool_a was being made in a place where this pool is not

etorix · October 12, 2024, 1:08pm

You need some special from SMART ?
smartctl -a /dev/sdX for X = b, c, d

A scrub would require writes.
If you have backed up, the next task is to rebuild. 585 errors is way too many. Some metadata errors may be cleared by getting rid of the corresponding damaged files, but top-level <0x0> is not likely to go away. (And, personally, I would not trust the hardware anyway.)
If you want to play with pool_a before destroying it, I suppose you could unmount/export and then try, in order, the potentially destructive

sudo zpool import -fFn -o altroot=/mnt pool_a
sudo zpool import -fFXn -o altroot=/mnt pool_a

If any result looks interesting, remove the ‘n’ to do it “for real”.

jerr · October 12, 2024, 1:41pm

Regarding SMART - all disks report:
SMART overall-health self-assessment test result: PASSED

Yes - I have a backup so I would like to repair it at least for educational purposes, so in order:

zpool offline pool_a
zpool export pool_a

for check

sudo zpool import -fFn -o altroot=/mnt pool_a
sudo zpool import -fFXn -o altroot=/mnt pool_a

and next

sudo zpool import -fF -o altroot=/mnt pool_a
sudo zpool import -fFX -o altroot=/mnt pool_a

but when any fix or scrub ??
sudo zpool scrub Pool_a

is it possible to perform a backup configuration of permissions, smb , nfs or other things for this pool_a?

etorix · October 12, 2024, 2:11pm

You may want to check for @Protopia’s advice. But I meant:

sudo zpool import -fFn -o altroot=/mnt pool_a

and, if the message looks interesting

sudo zpool import -fF -o altroot=/mnt pool_a

If not then the most destructive option

sudo zpool import -fFXn -o altroot=/mnt pool_a

‘n’ is a dry run. ‘F’ is recovery mode; ‘FX’ discards transactions until it finds an importable state.
If any of these succeeds in importing the pool, scrub should be possible (but not necessarily successful).

jerr · October 12, 2024, 3:27pm

root@truenas[~]# sudo zpool import -fFn -o altroot=/mnt pool_a
cannot import 'pool_a': a pool with that name already exists
use the form 'zpool import <pool | id> <newpool>' to give it a new name

pool_a should be first unmounted ? exported ?

etorix · October 12, 2024, 4:07pm

Yes, export first.

jerr · October 12, 2024, 6:50pm

after exporting the pool none of the commands can re-import them:

root@truenas[~]# sudo zpool import -fFn -o altroot=/mnt pool_a

no feedback, no confirmation


root@truenas[~]# sudo zpool import -fF -o altroot=/mnt pool_a
cannot import 'pool_a': insufficient replicas
	Destroy and re-create the pool from
	a backup source.


root@truenas[~]# sudo zpool import -fFX -o altroot=/mnt pool_a 
cannot import 'pool_a': one or more devices is currently unavailable

After import via web without success with the alarm:

.....
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1360, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 2095 is not a valid Error

Protopia · October 12, 2024, 6:56pm

As I said previously, you need to STOP using the UI and rely only on the command shell until you have a reliable pool.

Only then can you work on bringing the UI back into line with the underlying ZFS system.