Resilvering Stuck after changing disk array connector

melonion · October 4, 2024, 8:55pm

We have a pool of a big bunch of data with a lot of snapshots (would also be open to clearing some of those if it could help) in a hard disk encasement of 8x16TB disks organised in a raidz2. We changed the connection of that encasement from USB to eSATA in hopes of increasing speeds beyond meager 40MB/s. Instead, zfs started a resilvering which progressed for a few minutes and now has been stuck for a good 4 hours at the same scanned/issued count!

We also have a traditional raid array in an identical enclosure with disks half the size, which function smoothly after the connector change.

$ sudo zpool status -xv
  pool: b
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Oct  4 18:15:06 2024
	33.7G scanned at 2.19M/s, 18.8G issued at 1.23M/s, 95.8T total
	0B resilvered, 0.02% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	b                                         ONLINE       0     0     0
	 raidz1-0                                ONLINE       0    12     0
	   e654fda2-1bab-4dd7-8941-27b7c5399456  ONLINE       3     6     0
	   c3e92d12-8b7d-475c-ac04-2afd4887b551  ONLINE       3    20     0
	   89e8070b-c005-4d31-9159-c2368ffd4be3  ONLINE       3     6     0
	   5b7157fc-7996-4c6b-9600-2b3d93b90bd9  ONLINE       3    22     0
	   3e08e841-46a5-41b2-92d0-13a47c81d6d5  ONLINE       3     6     0
	   e961fda4-7c66-4d3c-9fee-d3346908da32  ONLINE       3     6     0
	   18f61b50-172e-4a17-b0ca-51e219491a8d  ONLINE       3     6     2
	   8ab72240-b593-4094-be04-ab8482f30414  ONLINE       3     6     0

errors: List of errors unavailable: pool I/O is currently suspended

$ sudo zpool iostat -v 1 2
                                            capacity     operations     bandwidth 
pool                                      alloc   free   read  write   read  write
----------------------------------------  -----  -----  -----  -----  -----  -----
b                                         95.8T  35.1T      5      0  40.3K  3.50K
  raidz1-0                                95.8T  35.1T      5      0  40.3K  3.50K
    e654fda2-1bab-4dd7-8941-27b7c5399456      -      -      0      0  5.09K    459
    c3e92d12-8b7d-475c-ac04-2afd4887b551      -      -      0      0  4.98K    447
    89e8070b-c005-4d31-9159-c2368ffd4be3      -      -      0      0  5.02K    456
    5b7157fc-7996-4c6b-9600-2b3d93b90bd9      -      -      0      0  4.87K    437
    3e08e841-46a5-41b2-92d0-13a47c81d6d5      -      -      0      0  5.86K    453
    e961fda4-7c66-4d3c-9fee-d3346908da32      -      -      0      0  5.88K    436
    18f61b50-172e-4a17-b0ca-51e219491a8d      -      -      0      0    104    457
    8ab72240-b593-4094-be04-ab8482f30414      -      -      1      0  8.53K    438
----------------------------------------  -----  -----  -----  -----  -----  -----
boot-pool                                 43.1G  74.9G      0     17  10.7K  1.85M
  nvme1n1p3                               43.1G  74.9G      0     17  10.7K  1.85M
----------------------------------------  -----  -----  -----  -----  -----  -----
eagle                                      219G  2.51T      0      0    699  1.14K
  raidz1-0                                 219G  2.51T      0      0    699  1.14K
    9c2a35d3-9d6e-45b3-aacc-25232bfb7909      -      -      0      0    228    390
    636f3146-83f0-4994-9d86-1dfd36950ff0      -      -      0      0    245    390
    6f2e5f52-c830-428e-8248-09377743ed57      -      -      0      0    225    391
----------------------------------------  -----  -----  -----  -----  -----  -----
nova                                       325G  7.12T      0      0    789  1.21K
  raidz1-0                                 325G  7.12T      0      0    789  1.21K
    925a0ab3-9dda-488f-9497-10d1e210e213      -      -      0      0    195    314
    14a8c6ff-de7c-4f41-a1f4-a759a6cb074b      -      -      0      0    198    311
    f1e680a4-1184-493a-bb2f-da6bcebf0246      -      -      0      0    198    307
    f4eef501-8214-40a9-be25-03a5b4c918e7      -      -      0      0    196    305
----------------------------------------  -----  -----  -----  -----  -----  -----
                                            capacity     operations     bandwidth 
pool                                      alloc   free   read  write   read  write
----------------------------------------  -----  -----  -----  -----  -----  -----
b                                         95.8T  35.1T      0      0      0      0
  raidz1-0                                95.8T  35.1T      0      0      0      0
    e654fda2-1bab-4dd7-8941-27b7c5399456      -      -      0      0      0      0
    c3e92d12-8b7d-475c-ac04-2afd4887b551      -      -      0      0      0      0
    89e8070b-c005-4d31-9159-c2368ffd4be3      -      -      0      0      0      0
    5b7157fc-7996-4c6b-9600-2b3d93b90bd9      -      -      0      0      0      0
    3e08e841-46a5-41b2-92d0-13a47c81d6d5      -      -      0      0      0      0
    e961fda4-7c66-4d3c-9fee-d3346908da32      -      -      0      0      0      0
    18f61b50-172e-4a17-b0ca-51e219491a8d      -      -      0      0      0      0
    8ab72240-b593-4094-be04-ab8482f30414      -      -      0      0      0      0
----------------------------------------  -----  -----  -----  -----  -----  -----
boot-pool                                 43.1G  74.9G      0      0      0      0
  nvme1n1p3                               43.1G  74.9G      0      0      0      0
----------------------------------------  -----  -----  -----  -----  -----  -----
eagle                                      219G  2.51T      0      0      0      0
  raidz1-0                                 219G  2.51T      0      0      0      0
    9c2a35d3-9d6e-45b3-aacc-25232bfb7909      -      -      0      0      0      0
    636f3146-83f0-4994-9d86-1dfd36950ff0      -      -      0      0      0      0
    6f2e5f52-c830-428e-8248-09377743ed57      -      -      0      0      0      0
----------------------------------------  -----  -----  -----  -----  -----  -----
nova                                       325G  7.12T      0      0      0      0
  raidz1-0                                 325G  7.12T      0      0      0      0
    925a0ab3-9dda-488f-9497-10d1e210e213      -      -      0      0      0      0
    14a8c6ff-de7c-4f41-a1f4-a759a6cb074b      -      -      0      0      0      0
    f1e680a4-1184-493a-bb2f-da6bcebf0246      -      -      0      0      0      0
    f4eef501-8214-40a9-be25-03a5b4c918e7      -      -      0      0      0      0
----------------------------------------  -----  -----  -----  -----  -----  -----

What are sensible ways forward here?
Reconnect via USB?
Tweak ZFS Parameters?
Restart the system?
Try mounting the pool?

I scavenged the forums but did not really find out what to do.

Protopia · October 4, 2024, 9:31pm

You say the pools are RAIDZ2 but I think this must be a typo because all pools in your terminal output show RAIDZ1.
What is/are the model number(s) of the drives in the pool which is resilvering?
You know that USB connected drives are unsupported for ZFS? However I am less concerned about the old USB connections and more concerned about the new way drives are connected.

USB connections can be multiplexed through a USB hub, so you can have multiple drives connected via a single USB port - and this can cause problems for multi-drive ZFS pools. USB connections often also suffer from disconnects. I am less clear whether you can multiplex multiple drives with eSATA. Please give detailed current connection information for each and every drive in the resilvering pool.

melonion · October 5, 2024, 1:00am

That is indeed interesting, cause the reported storage of 96TB matches pretty much 16 x 6, and as you can see there there are 8 disks so everything (including my memory) points to it being a raidz2, except for the output of ZFS.
Yeah USB is not the best, that is why we switched - did not know it was explicitly unsupported though.

We have 8x TOSHIBA_MG09ACA18TE in a Fantec QB-X8US3-6G previously connected via USB 3 now via eSATA to a newly acquired PCIe card that has two eSATA slots.

Thanks for your help already! Am a bit stressed out about this situation ^^

melonion · October 5, 2024, 1:08am

Here is lsblk:

admin@truenas[~]$ lsblk -o NAME,MODEL,SIZE,TYPE | grep disk | grep -v zd
sdj         ST8000VN0022-2EL112              7.3T disk
sdk         ST8000VN0022-2EL112              7.3T disk
sdl         ST8000VN0022-2EL112              7.3T disk
sdm         ST8000VN0022-2EL112              7.3T disk
sdn         ST8000VN0022-2EL112              7.3T disk
sdo         ST8000VN0022-2EL112              7.3T disk
sdp         ST8000VN0022-2EL112              7.3T disk
sdq         ST8000VN0022-2EL112              7.3T disk
sdr         SanDisk_SSD_PLUS_1000GB        931.5G disk
sds         SanDisk_SSD_PLUS_1000GB        931.5G disk
sdt         SanDisk_SSD_PLUS_1000GB        931.5G disk
sdu         TOSHIBA_MG09ACA18TE             16.4T disk
sdv         TOSHIBA_MG09ACA18TE             16.4T disk
sdw         TOSHIBA_MG09ACA18TE             16.4T disk
sdx         TOSHIBA_MG09ACA18TE             16.4T disk
sdy         TOSHIBA_MG09ACA18TE             16.4T disk
sdz         TOSHIBA_MG09ACA18TE             16.4T disk
sdaa        TOSHIBA_MG09ACA18TE             16.4T disk
sdab        TOSHIBA_MG09ACA18TE             16.4T disk
nvme2n1     Lexar SSD NM620 2TB              1.9T disk
nvme3n1     Lexar SSD NM620 2TB              1.9T disk
nvme0n1     Lexar SSD NM620 2TB              1.9T disk
nvme4n1     Lexar SSD NM620 2TB              1.9T disk
nvme1n1     Samsung SSD 970 EVO Plus 500GB 465.8G disk

Farout · October 5, 2024, 5:50am

And I would be too.

ZFS doesnt lie, your pools are all RAIDZ1.
Obviously there is some sort of port multiplication going on inside these external cases. This can lead to total data loss.

What you wanna do is ditch that setup asap and go for something like this:

On the server side you need a HBA with external connectors. Flashed to IT Mode.

Also 8 wide raidz1 is scary…

Protopia · October 5, 2024, 9:12am

You don’t have 16 x 6TB.

You have 8x 7.3TB and 8x 16.4TB - so you should see c. maybe 45TB in one vDev and 105TB in the other for a total of c. 150TB not 96TB.

(The 7.3TB drives are marketed as 8TB drives - 8 x 10^12 is c. 7.27 x 2^40, and similarly 18 x 10^12 is c. 16.37 x 2^40.)

As @Farout says, having this many disks and that much storage under a RAIDZ1 is very scary - in the event of a resilver it is quite possible that a 2nd drive will fail under the stress whilst you are resilvering and then your data is toast.

As for the performance, when you do a resilver it has to read blocks from 7 disks and write them to one and it does this very heavily - and by multiplexing 8 drives onto a single eSata connection you are effectively getting a single Sata connection’s worth of bandwidth rather than 8 connections’ worth of bandwidth.

According to the first post, in 4 hours it is 0.02% done so a quick calculation suggests that it will take a total of 20,000 hours or 2 years and 4 months to resilver at that speed. If it was resilvering at all, but it does say 0B resilvered.

ZFS and TrueNAS do not support traditional RAID arrays either - so your other pool is also at risk.

I am sorry to say that whoever built this server really didn’t understand the hardware requirements for ZFS and you ended up with a very risky set of hardware.

I think you need to start to take urgent and immediate action to seriously rebuild your server using hardware supported by ZFS.

That means buying a decent storage enclosure for all your drives that support SATA / SAS connections and a decent HBA to connect them. You cannot convert your RAID1 setup to ZFS in place, so you will also need to buy some extra drives to migrate them to a supported environment. You should also take advantage of this rebuild to move to RAIDZ2, and that will also require extra drives. With a decent plan you should be able to minimise the amount of extra disks you need to buy.

This is a non-trivial migration, so you will need someone experienced to plan and execute it.

Protopia · October 5, 2024, 9:43am

P.S. A few more comments on your hardware…

I see that there are two open slots on the back of your server letting in dust.
I may be wrong but the back panel looks quite ancient - this may be an opportunity to update all your hardware to avoid future failures from old kit.
You should think about investing in a UPS if you haven’t already got one.

melonion · October 5, 2024, 10:18am

The 8TB drives are from the time before TrueNAS and unrelated to this situation. The 16TB drives make up the array, so I am still very confused how I can end up with 96TB in a raidz1.

We do have a UPS.

Thanks about the insights, that does make sense, is gonna be interesting to act on though… can I feasibly mount the pool read-only to get out my important data?

etorix · October 5, 2024, 10:49am

The resilver process is absolutely not linear, and remaining time estimates are dismally bad. ZFS will first read A LOT of metadata, taking a LONG time for no progress, then it will actually proceed with data but linear extrapolations will be totally off.

melonion · October 5, 2024, 11:45am

I shut it down last night after it had been without any progress for a good 10 hours - as I said it did not do 0.02% in 4 hours, it did that in about 20 minutes and then seemingly stalled.

ZFS will first read A LOT of metadata, taking a LONG time for no progress, then it will actually proceed with data but linear extrapolations will be totally off.

@etorix is there any visible reported progress during the metadata reading?

melonion · October 5, 2024, 11:57am

To get the data off the disks, would you suggest going back to USB 3 for now which used to work or retry with eSATA?

etorix · October 5, 2024, 12:21pm

This is the “allocated” (used) storage, and there’s 35.1 TB still available.

The enclosure obviously has some kind of controller, but I could not find its reference in the specification or data sheet. I would expect USB to be an extra adapter on top of the SATA controller, but maybe there are two different data paths.

Reported progress is just non-linear.
ZFS wants to access all drives in parallel during a scrub or resilver; if the controller is essentially presenting one drive at a time, everything will slow to a crawl.

Honestly, I would power down the server to avoid aggravating the situation, shop around for a SAS enclosure and HBA and move the drives there.

melonion · October 5, 2024, 12:24pm

Where is the conflict with the hardware requirements though? Can you point to a link?
This is supposed to be an affordable data dump, we only ended up with ZFS because we chose TrueNAS to manage it. Getting an expensive HBA and connecting everything manually kind of defeats the purpose, cause the speed of the eSATA connection should be enough in practice.

melonion · October 5, 2024, 12:28pm

Now that is even more odd, even going with the nominal capacity of the disks of 18 TB, 18 x 7 is 126 TB which is less than alloc 96 + free 35 = 131 TB.
Something does not add up here.

Farout · October 5, 2024, 12:32pm

Usb connections
sata involving port multiplication

EDIT: And this is because ZFS wants direct access to all the discs simultaniously. Neither of the options above can ensure that.

You dont need to use truenas. Choose an non-ZFS NAS software with your current hardware.

etorix · October 5, 2024, 12:40pm

The unknown SATA controller in there…

Disk space is never a simple operation with a CoW filesystem.
It about adds up if the reported figures are about raw disk use, including parity, but excluding a few percents for ZFS internal book keeping. 8 x 16.6 = 132.8 TiB

Protopia · October 5, 2024, 1:43pm

You are of course absolutely free to decide whether or not to follow the advice given here by people with experience and knowledge. Just expect us to say “told-you-so” when it all goes wrong in the future.

Documentation

TrueNAS uses ZFS, and ZFS has some explicit hardware requirements. And all bets are off if you decide to ignore these.

Here is what [the official TrueNAS Scale 24.04 Hardware Guide says about USB Drives](Here is what the official TrueNAS Scale 24.04 Hardware Guide says about Storage Controllers:

Avoid using USB-connected hard disks for primary storage with TrueNAS.

Here is what the official TrueNAS Scale 24.04 Hardware Guide says about Storage Controllers:

There are countless warnings against using hardware RAID cards with TrueNAS. ZFS and TrueNAS provide a built-in RAID that protects your data better than any hardware RAID card. You can use a hardware RAID card if it is all you have, but there are limitations. First and most importantly, do not use their RAID facility if your hardware RAID card supports HBA mode, also known as passthrough or JBOD mode (there is one caveat in the bullet list below). When used, it allows it to perform indistinguishably from a standard HBA. If your RAID card does not have this mode, you can configure a RAID0 for every disk in your system. While not the ideal setup, it works in a pinch.

The TrueNAS Hardware Guide doesn’t mention eSata ports at all nor Sata multiplexors of any sort, but it definitely does not say they are supported or recommended.

The ZFS Hardware Guide says the following about USB drives:

These have problems involving sector size reporting, SMART passthrough, the ability to set ERC, and other areas. ZFS will perform as well on such devices as they are capable of allowing, but try to avoid them. They should not be expected to have the same up-time as SAS and SATA drives and should be considered unreliable.

Cost

The most expensive items in a TrueNAS server by far are the disks - at c. $25 / 25€ per TB your disks probably cost c. $5,000 / 5,000€ - and this is c. 6x-10x the rest of the hardware put together. So why skimp on the other hardware and put the availability of your data at risk just for the sake of a relatively small marginal cost?

etorix · October 5, 2024, 1:55pm

ZFS and “affordable” do not really play well together.
Second-hand HBAs are not expensive, and you could connect an external SAS shelf with a SFF-8088 or SFF-8644 cable just like you do with eSATA (not “everything manually”). The SAS shelf, though, is likely to cost more than your Fantec.

If you prefer to keep using this hardware, then find enough storage to backup the data, switch to traditional RAID with OpenMediaVault, or to Unraid, and restore. ZFS with this inadequate enclosure is likely to eventually lead to data loss.

PhilD13 · October 5, 2024, 1:57pm

Resilvering a pool can take at least a week or more depending on size. A rough guess is around 1TB/hour. So there may not be anything wrong, except impatience.

Farout · October 5, 2024, 2:02pm

He has Read/Write and Checksum errors on all the discs…