I’m setting up my new TueNAS SCALE server and expanding my existing RAIDZ1 pool with a repurposed HDD. The UI has been stuck at 25% pool.attach. I’ve read some of the other posts (like this one here (which sadly did not reach a resolution), done some rudimentary troubleshooting, and learned from the CLI that the expansion is going at ~8MiB/s with an ETA in 14 days, which doesn’t seem normal. Looking at the disk dashboard the write speed appears to have momentarily peaked a couple of instances to 60MiB/s since the start, but the peaks are very short and few.
$ zpool status hdd_array -v
pool: hdd_array
state: ONLINE
expand: expansion of raidz1-0 in progress since Sat Jan 4 00:03:08 2025
480G / 10.4T copied at 8.47M/s, 4.50% done, 14 days 06:31:07 to go
config:
NAME STATE READ WRITE CKSUM
hdd_array ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
28b6cbd7-3726-447e-a894-5d618c9ab79f ONLINE 0 0 0
c71373d1-b30a-43b9-8f26-51e249021d5d ONLINE 0 0 0
1879b64d-45b0-42a9-8d48-b0381f87a21f ONLINE 0 0 0
336608bc-9edf-43e8-a630-55f24fbbbced ONLINE 0 0 0
errors: No known data errors
RAIDZ expansion has to move a lot of data. Assuming that your 3x8TB are 75% full, then each disk has 6TB. Expansion needs to spread this around, so each disk needs to have c. 4.5TB on it - so it needs to move a total of 4.5TB from the existing 3 disks to the extra disk.
The expansion is (supposed to be) restartable after a reboot - and whilst this is a rarely used function, anecdotes suggest it works. But this does mean that ZFS cannot run at full speed, but instead needs to keep writing consistency state to the disks, which makes it slower than you might expect.
People have reported that expansion can take days to achieve - and according to the zpool status you only started today (4 Jan) - so my advice is to avoid a reboot unless absolutely necessary and let it run, and hopefully eventually it will complete.
The disks report 6.95 TiB usage. With parity that would be ~10.5TiB to redistribute. Indeed not a small number.
Thanks, that’s good to know. It’s just baffling to me that the redistribution is so slow - for that it’s worth, almost all the data in the pool initially came from this disk and the copy over was completed nicely overnight (~10h) with 200MB/s. I would be fully on board for this to take a day or three, but 14 days made me think I’m doing something wrong. I did indeed start the process at midnight today, which for me is now roughly 17 hours ago and I’m at 513G / 10.4T → 8.4MiB/s average since start.
I guess I’ll heed your advice and wait it out since I don’t want to lose the data and I’m in no rush and the system will be on anyway.
Sorry - but whilst the total data blocks including parity is 10.5TB you need to move 1/4 of this to the new drive which is c. 2.6TB.
Once your expansion is complete, full records will be 2 data blocks + 1 parity rather than 3 data blocks + 1 parity. So 3 records will be 9 blocks, and they could be rewritten using 8 blocks. Small records won’t benefit. So using a rebalancing script to rewrite every file would recover (say) up to 1TB in space.
Looks like the SATAs are controlled by a SATA controller: ASM1166 but I don’t know how exactly it’s connected.
I’ll be honest: I can’t make much out of the data I’m pulling or know how to get better info, so if there’s anything more specific or useful about the topology that I can pull form the cli please let me know.
6 SATA ports from 2 PCIe lanes. Great for a low cost platform that is starved of lanes. Not so great for performance when ZFS request access from all drives simultaneously…
I’m afraid it is. Flexible chipset I/O usually trades one lane of PCIe for one lane of SATA. A SAS HBA would have no trouble repackaging 3 SATA lanes into one single SAS lane, but this low power ASM1166 certainly has not the computing power of a LSI3008.
Also, there is the “7th” drive which houses the other asmedia chip that allows you to connect 4 NVME. That is connected to the backplane as well, which in turn is connected via these two cables to the mainboard.
Cheap hardware can often only be cheap because compromises are made, and these are usually performance. By buying cheap hardware you are accepting that the performance will be good under low load, but under stress performance will suffer. If you are looking for peak performance under stress, then you needed to pay a lot more for a server architected to achieve that, and also pay for a top-speed LAN infrastructure.
Artificial benchmarks are VERY difficult to get the tests right so that they measure the right things, and the results can be difficult to analyse properly. But, because artificial tests stress the system, the one thing they are good at is highlighting all these compromises made to get the price down, regardless of whether you will actually notice the issue in real life. For example, in real life you expect a lot of your reads to be satisfied from memory (ARC, pre-fetch) - but a benchmark won’t usually reflect this properly - it will either get everything from cache or very little.
In the end what matters - and the only thing that really matters - is whether the performance you receive in real-life with a real workload is acceptable or not. If most of the time you are either 1) streaming (pre-fetched) or 2) reading or writing a handful of smallish files (low-volumes) or 3) doing a bulk copy where at best it is going to take minutes and where taking twice as long may not be an issue - then in reality cheap hardware is probably going to do you fine.
I would like to also point out that not being able to get these kinds of information from the manufacturer is a major obstacle to troubleshooting. Not being transparent with the hardware is a big red flag to me.
Further testing once the expansion is complete will be appreciated.