I’ve already considered many areas where it needs to work with the middleware and GUI. Not a programmer myself, but I have a fundamental understanding of the places that need “checks” before proceeding.
Here’s an example.
You visit the page in the GUI to remove a vdev from the pool. The middleware’s “things to check” include “Does this pool have a checkpoint?” If so, it will alert the user, without continuing.
The above scenario is actually ideal, since it favors safety over speed and convenience, where it concerns pool management via the GUI.
Ask any user with a NAS: “Would you rather have no safety net whatsoever when managing your pool, or would you be okay with a rudimentary safety net, but at the cost that you might be ‘inconvenienced’ when trying to remove a vdev?”
I’m sure most users would say: “That’s totally fine. If removing a vdev takes a minute longer to proceed, since I have to remove a checkpoint first, then it’s worth the added safety net to give me a chance of avoiding an irreversible disaster or accident.”
First of all, I managed to recover almost all my data. I say “almost” because I might have missed some files that I didn’t notice, but for the most important ones, I made backups.
I saved everything I needed on the 4TB drive I bought, using a SATA-to-USB cable. It took quite a while due to the low transfer speed.
I’ve just finished that step and started importing the pool following @HoneyBadger’s commands:
root@truenas[~]# zpool export TankPrincipal
cannot export 'TankPrincipal': pool is busy
root@truenas[~]#
I encountered the same issue we had when trying to mount the pool after the import. However, I didn’t spend much time troubleshooting it—I simply rebooted the server and executed the following command a few moments ago:
Next steps:
If the import completes successfully and the server is back up and running, I’ll run the script suggested by @MSameer (thanks, @MSameer!) on the new 4TB drive (where I have backed up my data) to replace the one with bad sectors.
After that, I’ll create snapshots for the most important datasets.
But I will run that script AFTER the import that is currently running completes successfully.
Once I make sure that the second -R import is ok, I will have to replace the dying drive. I’ll set some snapshots on the important datasets and then think about the 3 copies back up strategy.
What’s wrong with this ?
Edit :
I think I’ll go like this after the import :
I’ll disconnect the dying drive.
Let the pool resilver. If anything goes wrong, I still have the data on the new hard drive.
If the pool is up and running after resilvering, I’ll run the script that check the health of the new drive and if it’s fine I’ll put it in place.
root@truenas[~]# zpool status
pool: TankPrincipal
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 04:38:20 with 0 errors on Sat Nov 9 19:38:22 2024
config:
NAME STATE READ WRITE CKSUM
TankPrincipal ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1878897-part2 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1852520-part2 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4EHAH8F79-part2 ONLINE 0 0 0
errors: No known data errors
pool: boot-pool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdc3 ONLINE 0 0 0
errors: No known data errors
Is the Pool completely Online ? Do I need to do something else ? Maybe it has something to do with the -N option ?
root@truenas[~]# zpool get all TankPrincipal | grep feature
TankPrincipal feature@async_destroy enabled local
TankPrincipal feature@empty_bpobj active local
TankPrincipal feature@lz4_compress active local
TankPrincipal feature@multi_vdev_crash_dump enabled local
TankPrincipal feature@spacemap_histogram active local
TankPrincipal feature@enabled_txg active local
TankPrincipal feature@hole_birth active local
TankPrincipal feature@extensible_dataset active local
TankPrincipal feature@embedded_data active local
TankPrincipal feature@bookmarks enabled local
TankPrincipal feature@filesystem_limits enabled local
TankPrincipal feature@large_blocks enabled local
TankPrincipal feature@large_dnode disabled local
TankPrincipal feature@sha512 enabled local
TankPrincipal feature@skein enabled local
TankPrincipal feature@edonr disabled local
TankPrincipal feature@userobj_accounting disabled local
TankPrincipal feature@encryption disabled local
TankPrincipal feature@project_quota disabled local
TankPrincipal feature@device_removal disabled local
TankPrincipal feature@obsolete_counts disabled local
TankPrincipal feature@zpool_checkpoint disabled local
TankPrincipal feature@spacemap_v2 disabled local
TankPrincipal feature@allocation_classes disabled local
TankPrincipal feature@resilver_defer disabled local
TankPrincipal feature@bookmark_v2 disabled local
TankPrincipal feature@redaction_bookmarks disabled local
TankPrincipal feature@redacted_datasets disabled local
TankPrincipal feature@bookmark_written disabled local
TankPrincipal feature@log_spacemap disabled local
TankPrincipal feature@livelist disabled local
TankPrincipal feature@device_rebuild disabled local
TankPrincipal feature@zstd_compress disabled local
TankPrincipal feature@draid disabled local
TankPrincipal feature@zilsaxattr disabled local
TankPrincipal feature@head_errlog disabled local
TankPrincipal feature@blake3 disabled local
TankPrincipal feature@block_cloning disabled local
TankPrincipal feature@vdev_zaps_v2 disabled local
TankPrincipal feature@redaction_list_spill disabled local
TankPrincipal feature@raidz_expansion disabled local
TankPrincipal feature@fast_dedup disabled local
root@truenas[~]#
@HoneyBadger, will a reboot automatically import the pool without the “no mount” (-N) flag? In other words, any command-line work is “done”, and now it’s back to business as usual with the GUI only?
I believe after the successful (non-read-only) import via the command-line, you are to reboot the server, which will handle things automatically like normal.
@Berboo your services and middleware will be upset about the pool being offline/online again. If it successfully imported in read/write mode, a subsequent reboot should import without issue.
root@truenas[~]# zpool status
pool: boot-pool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdd3 ONLINE 0 0 0
errors: No known data errors
root@truenas[~]#