Very strange behaviour.
A few days ago I started, with great trepidation, to double the capacity of the above pool using 5 brand new WD Red Pro disks.
The first one, 1/5, completed ok with a couple of CKSUM errors, which I cleared and ran the pool for a while with no issues. All online and zero errors. No data was ever affected.
I followed the same process to replace the next drive, 2/5:
- Offline
- Physically remove old disk
- Insert new disk - same slot
- Hit ‘replace’ and select drive from dropdown list (only one entry available, so foolproof, eh?)
- hit the button and confirm
After some time the progress showed reasonable values for progress and time to go to completion of resilvering. This is where weird stuff started.
The first disk I replaced earlier that successfully ran for a while suddenly developed a few CKSUM errors (53) at about the same time the ‘replacing’ vdev also showed a similar figure (68). It, 1/5, then also decided to resilver again, even though it was not part of the current replacement process for 2/5.
Although I was worried, after some research and building some faith in the process I waited for it to complete, at which point (2 days 2 hours later) both resilvering activities completed ok.
Again, pool running ok for a while with no errors and everything online and active as expected.
I am now replacing 3/5 and the same thing happened. After about an hour 2/5 threw some errors and spontaneously started to resilver:
[…]
root@rex[~]# zpool status -v -LP re1
pool: re1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Dec 22 18:43:00 2025
5.03T / 21.6T scanned at 242M/s, 2.99T / 21.6T issued at 144M/s
621G resilvered, 13.85% done, 1 days 13:39:31 to go
config:
NAME STATE READ WRITE CKSUM
re1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
/dev/sdh2 ONLINE 0 0 0
/dev/sdk1 ONLINE 0 0 13 (resilvering)
/dev/sda2 ONLINE 0 0 0
/dev/sde1 ONLINE 0 0 0
replacing-4 DEGRADED 0 0 11
/dev/disk/by-partuuid/c2400e8f-3476-483d-a675-873e32aff0b8 OFFLINE 0 0 0
/dev/sdf1 ONLINE 0 0 0 (resilvering)
errors: No known data errors
[…]
As you can see, sdk1 is 2/5 and was previously running error free until I replaced sdf1, which is currently 3/5.
Throughout this process, the GUI never showed the repeated resilver, only the current one:
So, I am hoping this is not serious and when I finally replace 5/5 I will be able to expand the pool into the new capacity.
;-^}
P