I rebooted, but when it started there no datasets and no pools, so I runed again your command, I see the datasets but no pool
Try exporting again, then:
zpool import -F -o altroot=/mnt HDD
I tried, after reset there are no pools and no datasets, so I can just run
zpool import -F -o altroot=/mnt HDD
but again I see the datasets but no pools. I restarted again and in pools I see no pool, but if I click import it show I can import HDD, I’m importing from gui
Ok, it finished importing from gui and I see everything is fine now, pool there and no error, datasets, apps.
1000000000000000000000 Thank you for your help. Now If I can bother more, how do I find out if any of the drives are indeed bad or what
I didnt really help .
Better copy away your important data…
I think I´ll change disc a and b whichever they are one by one. Can I mix them with segate, maybe I get better luck
Yes. Just make sure they are CMR drives and not SMR, and of the same size.
Thank you again, I´m running a short smart test on all drive,
You should do long tests.
Also when replacing discs in a dodgy raidz1, its best to replace drives while the old ones are still connected.
dont I need to export the faulty disk and connect the new one instead, and let the system resilver?
If you have a spare port, do not offline anything, plug in the new drive, go to Storage > Pool > Status, select a drive and click “Replace”. ZFS will resilver to replace the drive and then offline the old drive; remove the old drive, rince and repeat.
@smic717394 You got lucky when you did as the error message suggested and did an zpool import -F
- you could easily have made things worse by trying random commands.
Take my advice and SLOOOOOW DOWN, and wait for expert advice from people who have some knowledge and understanding of ZFS.
So, before you decide to swap out drives you need to establish what problems you now have and whether a drive actually needs to be swapped out. If you have some other sort of problem and you attempt to swap out a drive you may end up losing your data.
So, here is my opinion on what you need to do:
-
Run SMART Long tests on each drive. Once they have all finished…
-
Run
smartctl -x /dev/sdX
for each drive again, and post the responses, this time making sure to follow my previous instructions about enclosing them with lines containing ``` so that the output is readable. The last lot were 100x more difficult to interpret because you didn’t do this - but from what I could tell from trying to read this mess, all the drives looked fine. -
Reboot and check the pool comes online automatically again - maybe repeat this to be doubly sure.
-
Let us analyse the
smartctl
output and advise on whether your disks have a problem or not, and if so what to do about it.
Agreed… I didnt just run the zpool import -f, I wal told here to tun the command zpool import, running this truenas message was that I should run zpool import -f, anyway, the advice is good and appreciated.
I´m running a log test on first disk it’s at 40% see how it goes, when ill finished hem all Ill post the results.
So far Topology, Usage, ZFS Health and Disk Health all green. I’m thinking maybe the hotswap connector on the back of some bays are bad or the power supply, but its early to say, it’s still running the first disk it’s at 50% after like 3 hours.
Anyways, I really appreciate all your guys help.
I forgot to mention after I imported the pool and was everything working, I got later a notification
ZFS has finished a resilver:
eid: 28
class: resilver_finish
host: HomeServer
time: 2024-11-20 14:25:03+0100
pool: HDD
state: ONLINE
scan: resilvered 696K in 00:00:03 with 0 errors on Wed Nov 20 14:25:03 2024
config:
NAME STATE READ WRITE CKSUM
HDD ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
73e8954f-fd43-4770-a5f8-78faa91fd6ee ONLINE 0 0 0
25087691-9aac-45c6-99a2-741fcda14a58 ONLINE 0 0 0
f4d8b7aa-08e7-49b8-a06e-c0c1bca59d58 ONLINE 0 0 0
c842d9a6-4ca8-4477-81aa-cae554d19506 ONLINE 0 0 0
errors: No known data errors`
and looking to the error notifications from this morning, is there any way to identify the drive from this info?
The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.
impact: Fault tolerance of the pool may be compromised.
eid: 19
class: statechange
state: FAULTED
host: HomeServer
time: 2024-11-20 05:34:37+0100
vpath: /dev/disk/by-partuuid/f4d8b7aa-08e7-49b8-a06e-c0c1bca59d58
vguid: 0xC6B9449BC950CBC3
pool: HDD (0x05C0866261A9460F)
@smic717394 A little knowledge is a dangerous thing. zpool import -f
and zpool import -F
are very different things. You REALLY need to know what you are doing when you issue ZFS console commands, especially when you do it as root where there are few if any safety nets.
To save time, you can run the SMART long tests in parallel as each one only involves the drive you run it on.
The resilver message and most recent zpool status
shows zero errors. The details of which drive had errors previously may be available in the system logs. Someone else will need to tell you which commands to run to check for this. But let’s wait until we see the smartctl -x
output once the long tests have finished and see if that helps identify the root cause.
Once the system is fully stable you should implement @joeschmuck’s Multi-Report script so that you get early warnings by email of any disk issues.
still running the smart but I think I identify sba2 and sbb2
Running zpool status I get
pool: HDD
state: ONLINE
scan: scrub canceled on Wed Nov 20 14:37:59 2024
config:
NAME STATE READ WRITE CKSUM
HDD ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
73e8954f-fd43-4770-a5f8-78faa91fd6ee ONLINE 0 0 0
25087691-9aac-45c6-99a2-741fcda14a58 ONLINE 0 0 0
f4d8b7aa-08e7-49b8-a06e-c0c1bca59d58 ONLINE 0 0 0
c842d9a6-4ca8-4477-81aa-cae554d19506 ONLINE 0 0 0
errors: No known data errors
Then running zpool status -LP HDD I get
root@HomeServer[~]# zpool status -LP HDD
pool: HDD
state: ONLINE
scan: scrub canceled on Wed Nov 20 14:37:59 2024
config:
NAME STATE READ WRITE CKSUM
HDD ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
/dev/sdb2 ONLINE 0 0 0
/dev/sdd2 ONLINE 0 0 0
/dev/sda2 ONLINE 0 0 0
/dev/sdc2 ONLINE 0 0 0
errors: No known data errors
So I quess
sdb2 : 73e8954f-fd43-4770-a5f8-78faa91fd6ee
sdd2: 25087691-9aac-45c6-99a2-741fcda14a58
sda2: f4d8b7aa-08e7-49b8-a06e-c0c1bca59d58
sdc2; c842d9a6-4ca8-4477-81aa-cae554d19506
Then running dd if=/dev/sds2 of=/dev/null bs=1M count=5000 I can see wtaht drive is what
It is a reasonable assumption that the devices are shown in the same order, but it is still an assumption rather than a definitive extrapolation.
The correct way to do this is using lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
which will list the device name /dev/sdX1
and the associated UUID which you can then use to map against the UUIDs in the pool.
Also to identify drives by flashing lights, the dd
command is probably an OK way to do it, but you can probably specify the device by UUID i.e. dd if=/dev/disk/by-uuid/25087691-9aac-45c6-99a2-741fcda14a58 of=/dev/null bs=1M count=5000
.
I like this command. Thank you. The second I had to change
dd if=/dev/disk/by-uuid
to
dd if=/dev/disk/by-partuuid/
becase I see the disks are devided in 2 partitions a swap sda1 and the main partition sda2, not sure if this is ok.
sda WDC WD40EFZX-68AWUN0 1 gpt disk 4000787030016
├─sda1 1 gpt part 128 2147418624 Linux swap d3c6800b-f1f3-48a5-8732-ffa98f2d65f7
└─sda2 1 gpt part 4194432 3998639463936 Solaris /usr & Apple ZFS f4d8b7aa-08e7-49b8-a06e-c0c1bca59d58
Sorry - my mistake. {fill fill}
OK, so I done a long smart test on all 4 drives one by one, took about 7 hours each. And they all show pass.
What else could it be?