I believe that the import may have failed because a directory already existed at the mount point.
But it is weird that zpool status
says the pool is online but the zfs mount
says otherwise.
I believe that the import may have failed because a directory already existed at the mount point.
But it is weird that zpool status
says the pool is online but the zfs mount
says otherwise.
Yes - we don’t seem to have much to lose at this point. Let’s try it as sudo zpool import -o altroot=/mnt pool_a
without the force flag and see what happens.
so before next:
root@truenas[~]# sudo zpool status -v
pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:01:01 with 0 errors on Wed Oct 9 03:46:03 2024
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0
errors: No known data errors
pool: pool_b
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
pool_b ONLINE 0 0 0
c3c95aae-6a89-4494-b079-fc12934f725b ONLINE 0 0 0
errors: No known data errors
then nothing changed:
root@truenas[~]# sudo zpool import -o altroot=/mnt pool_a
cannot import 'pool_a': insufficient replicas
Destroy and re-create the pool from
a backup source.
root@truenas[~]# ls -la /mnt
total 10
drwxr-xr-x 4 root root 4 Oct 11 15:09 .
drwxr-xr-x 21 root root 29 Oct 4 03:50 ..
drwxr-xr-x 2 root root 2 Oct 6 11:46 .ix-apps
drwxr-xr-x 2 root root 2 Oct 10 09:51 pool_b
Can you try the readonly import command you used last time but add the altroot bit to it?
sudo zpool import -o readonly=on,altroot=/mnt pool_a
??(I am running out of ideas.) Time to admit defeat soon unless someone more expert that I can help you.
root@truenas[~]# sudo zpool import -o readonly=on -o altroot=/mnt pool_a
root@truenas[~]# ls -la /mnt
total 43
drwxr-xr-x 5 root root 5 Oct 11 18:01 .
drwxr-xr-x 21 root root 29 Oct 4 03:50 ..
drwxr-xr-x 2 root root 2 Oct 6 11:46 .ix-apps
drwxrwxrwx 8 ja root 8 Oct 6 12:41 pool_a
drwxr-xr-x 2 root root 2 Oct 10 09:51 pool_b
root@truenas[~]# ls -la /mnt/pool_a
total 78
drwxrwxrwx 8 ja root 8 Oct 6 12:41 .
drwxr-xr-x 5 root root 5 Oct 11 18:01 ..
drwxr-xr-x 11 ja root 12 Oct 6 11:18 STORAGE
drwxrwxrwx 8 ja root 9 Nov 21 2022 VM
drwxrwx--- 4 root root 4 Oct 7 10:27 appData
drwxr-xr-x 2 root root 8 Oct 6 00:08 covert
drwxr-xr-x 8 root root 11 May 19 2023 ix-applications
drwxrwxrwx 5 reolink reolink 6 Oct 6 22:20 reolink
As for me you are a REAL MASTER!!
it looks like everything is :-))))
and now I have a question - what next?
what happened?
will this state persist after restart? or maybe copy what can be done now?
run SCRUB?
I suspect that the standard mount won’t work - it probably won’t mount readonly as standard.
My advice:
We should try to ensure you don’t get a repeat.
Please post details of your hardware, in particular your exact disk models and whether you use motherboard SATA ports or have an HBA, and if an HBA what model that is and whether it is in IT mode. Also confirm that your SMART details for all your drives are clean.
Ok, I’m starting to work, so for now it’s not known what caused it?
Should I put the hardware data here or somewhere else?
Here, so we can see if there’s a suspect and vet the configuration.
Ok, if there are no standard commands listing the necessary information, I will compile exactly what you asked for
As a starter:
lsblk -bo NAME,MODEL,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
lspci
sas2flash -list
sas3flash -list
Unfortunately, the trunas community has blocked me from making entries for 5 hours, but copy data to other pool is almost finished so I’m starting scrub.
After everything I will send confirmation about the status of the pool
Board Manufacturer: Supermicro
Board Product Name: X11SCZ-F
4xSATA 12TB
RAM = 3x32GB PATRIOT PSP432G2662H1 266Mhz CL19
pool_a = sdb, sdc, sdd
pool_b = sda
smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device
lsblk -bo NAME,MODEL,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
NAME MODEL PTTYPE TYPE START SIZE PARTTYPENAME PARTUUID
sda TOSHIBA MG07ACA12TE gpt disk 12000138625024
└─sda1 gpt part 2048 12000136527872 Solaris /usr & Apple ZFS c3c95aae-6a89-4494-b079-fc12934f725b
sdb TOSHIBA MG07ACA12TE gpt disk 12000138625024
├─sdb1 gpt part 128 2147418624 Linux swap 648ef9cc-39c0-4563-83bd-825553c188d9
└─sdb2 gpt part 4194432 11997991058944 Solaris /usr & Apple ZFS 2fc128e1-b472-42ea-8a56-715eb5305916
sdc ST12000NE0008-2PK103 gpt disk 12000138625024
├─sdc1 gpt part 128 2147418624 Linux swap 09142a11-cd5b-42c3-b3eb-a66aea81c773
└─sdc2 gpt part 4194432 11997991058944 Solaris /usr & Apple ZFS 39d99900-6aa3-4c05-862e-73f24438a182
sdd ST12000NE0008-2PK103 gpt disk 12000138625024
├─sdd1 gpt part 128 2147418624 Linux swap 889e609b-9e1b-4f1f-b7c7-3836f53b1680
└─sdd2 gpt part 4194432 11997991058944 Solaris /usr & Apple ZFS d370c168-7b3b-4c12-9421-af4dd222fc09
zd0 disk 107374198784
zd16 dos disk 161061289984
zd32 disk 107374198784
zd48 dos disk 1099511644160
zd64 dos disk 2147500032
zd80 disk 11574951936
nvme0n1 SSDPEMKF256G8 NVMe INTEL 256GB gpt disk 256060514304
├─nvme0n1p1 gpt part 4096 1048576 BIOS boot d16d8765-eb51-491a-b5c5-df9286e187f3
├─nvme0n1p2 gpt part 6144 536870912 EFI System 9576796a-0147-4b87-9c5a-bfe5740577b0
├─nvme0n1p3 gpt part 34609152 238340611584 Solaris /usr & Apple ZFS 63c1f21e-b90b-47f5-9ca5-30e70ca1a197
└─nvme0n1p4 gpt part 1054720 17179869184 Linux swap 47bc055a-e609-4426-acf9-0573129a4aea
root@truenas[~]#
lspci
00:00.0 Host bridge: Intel Corporation 8th/9th Gen Core 8-core Desktop Processor Host Bridge/DRAM Registers [Coffee Lake S] (rev 0a)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #7 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation Cannon Point-LP LPC Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
03:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
05:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller
06:00.0 Non-Volatile memory controller: Intel Corporation SSD Pro 7600p/760p/E 6100p Series (rev 03)
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved
No LSI SAS adapters found! Limited Command Set Available!
ERROR: Command Not allowed without an adapter!
ERROR: Couldn't Create Command -list
Exiting Program.
sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.
No Avago SAS adapters found! Limited Command Set Available!
ERROR: Command Not allowed without an adapter!
ERROR: Couldn't Create Command -list
Exiting Program.
checked all HDD and all Completed without error, do You need some special from SMART ? I ask because there is so many datas
Ok, backup done so adventures continue
when I call SCRUB from the browser menu - it simply does not execute - the disks do not make noise, and the progress of the work is 0% after 7 hours
when I try to stop it from the terminal I get information that no SCRUB is being executed ( but of course in the browser it is still running):
root@truenas[~]# zpool scrub -s pool_a
cannot cancel scrubbing pool_a: there is no active scrub
When I run SCRUB from the terminal on pool_a I do not get any confirmation of the execution of the command, only the cursor blinks on a new line.
When I run SCRUB from the terminal on pool_b right after that, SCRUB executes, and below is the current status of both SCRUBs:
pool: pool_a
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 156M in 00:01:28 with 592 errors on Mon Oct 7 17:55:18 2024
config:
NAME STATE READ WRITE CKSUM
pool_a ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
2fc128e1-b472-42ea-8a56-715eb5305916 ONLINE 0 0 3K
39d99900-6aa3-4c05-862e-73f24438a182 ONLINE 0 0 3.01K
d370c168-7b3b-4c12-9421-af4dd222fc09 ONLINE 0 0 3.01K
errors: 585 data errors, use '-v' for a list
pool: pool_b
state: ONLINE
scan: scrub in progress since Sat Oct 12 14:00:09 2024
3.01T / 4.05T scanned at 23.5G/s, 13.9G / 4.05T issued at 109M/s
0B repaired, 0.33% done, 10:50:10 to go
config:
NAME STATE READ WRITE CKSUM
pool_b ONLINE 0 0 0
c3c95aae-6a89-4494-b079-fc12934f725b ONLINE 0 0 0
errors: No known data errors
when I try to clean errors I get information that pool_a is in redonly mode
root@truenas[~]# zpool clear pool_a
cannot clear errors for pool_a: pool is read-only
Of course I do not know anything about this but it looks like SCRUB for pool_a was being made in a place where this pool is not
You need some special from SMART ?
smartctl -a /dev/sdX
for X = b, c, d
A scrub would require writes.
If you have backed up, the next task is to rebuild. 585 errors is way too many. Some metadata errors may be cleared by getting rid of the corresponding damaged files, but top-level <0x0> is not likely to go away. (And, personally, I would not trust the hardware anyway.)
If you want to play with pool_a before destroying it, I suppose you could unmount/export and then try, in order, the potentially destructive
sudo zpool import -fFn -o altroot=/mnt pool_a
sudo zpool import -fFXn -o altroot=/mnt pool_a
If any result looks interesting, remove the ‘n’ to do it “for real”.
Regarding SMART - all disks report:
SMART overall-health self-assessment test result: PASSED
Yes - I have a backup so I would like to repair it at least for educational purposes, so in order:
zpool offline pool_a
zpool export pool_a
for check
sudo zpool import -fFn -o altroot=/mnt pool_a
sudo zpool import -fFXn -o altroot=/mnt pool_a
and next
sudo zpool import -fF -o altroot=/mnt pool_a
sudo zpool import -fFX -o altroot=/mnt pool_a
but when any fix or scrub ??
sudo zpool scrub Pool_a
is it possible to perform a backup configuration of permissions, smb , nfs or other things for this pool_a?
You may want to check for @Protopia’s advice. But I meant:
sudo zpool import -fFn -o altroot=/mnt pool_a
and, if the message looks interesting
sudo zpool import -fF -o altroot=/mnt pool_a
If not then the most destructive option
sudo zpool import -fFXn -o altroot=/mnt pool_a
‘n’ is a dry run. ‘F’ is recovery mode; ‘FX’ discards transactions until it finds an importable state.
If any of these succeeds in importing the pool, scrub should be possible (but not necessarily successful).
root@truenas[~]# sudo zpool import -fFn -o altroot=/mnt pool_a
cannot import 'pool_a': a pool with that name already exists
use the form 'zpool import <pool | id> <newpool>' to give it a new name
pool_a should be first unmounted ? exported ?
Yes, export first.
after exporting the pool none of the commands can re-import them:
root@truenas[~]# sudo zpool import -fFn -o altroot=/mnt pool_a
no feedback, no confirmation
root@truenas[~]# sudo zpool import -fF -o altroot=/mnt pool_a
cannot import 'pool_a': insufficient replicas
Destroy and re-create the pool from
a backup source.
root@truenas[~]# sudo zpool import -fFX -o altroot=/mnt pool_a
cannot import 'pool_a': one or more devices is currently unavailable
After import via web without success with the alarm:
.....
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1360, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 2095 is not a valid Error
As I said previously, you need to STOP using the UI and rely only on the command shell until you have a reliable pool.
Only then can you work on bringing the UI back into line with the underlying ZFS system.