Problem extending pool - pool.attach 25%

Does it make a big difference if the cache is mismatched?

Also I wanted to ask, if it’s possible to reboot the NAS during the expansion process or is there a chance that the data will be lost?

I don’t think the cache sizes mismatching will make a difference.

And I believe I read here that expansion will restart where it left off after a reboot.

1 Like

Mine has been running for 38 days now and is only up to 55% completed…
And once this drive is done being added, I have another one to add…

So about 4 months to expand 2 disks… Crazy

root@truenas[~]# zpool status data
pool: data
state: ONLINE
scan: resilvered 3.90T in 16:53:25 with 0 errors on Mon Nov 4 10:49:32 2024
expand: expansion of raidz1-0 in progress since Fri Nov 1 17:43:17 2024
9.14T / 16.4T copied at 2.71M/s, 55.60% done, (copy is slow, no estimated time)
config:

    NAME                                      STATE     READ WRITE CKSUM
    data                                      ONLINE       0     0     0
      raidz1-0                                ONLINE       0     0     0
        e077b967-57eb-4cb5-abb2-149d99dfdae0  ONLINE       0     0     0
        b7683a54-6aff-8b45-b68b-0e7fd4361d93  ONLINE       0     0     0
        d8390cb5-0673-3742-a323-f8bd762ea8ec  ONLINE       0     0     0
        13178070-5ded-1a42-ab01-950485d986be  ONLINE       0     0     0
        473195e3-0769-4497-ae57-c9720ed5278d  ONLINE       0     0     0

alright, thanks!
Yeah, I read that too, but I don’t want to take that risk so I will let it finish first.

that’s very slow - 23.68% after 13 days here. Are your disks smr or cmr?

Read above… There’s one SMR drive in the mix.

What are the exact models of your disks? Are they SMR disks?

After reading this thread I’m quite happy with my ~8 days of expanding at ~50-55MB/s.
I already thought that was slow…

They are all Western Digital Red 6TB drives… but now that i am looking at the specific model numbers and cross references their specs:

I have 4 WD60EFAX drives (SMR) in the pool now… but the one i am adding to the pool to expand is WD60EFRX (CMR)

Very very very bad news. I know it will be expensive, but you really really need to consider replacing all the EFAX drives with EFRX ones.

1 Like

What’s the benefit i would see? I don’t have any complaints around speed/performance and have seen no reliability issues either. Would swapping all the drives only benefit me during this expansion process (essentially a one time operation as i max out the bays)

You don’t have any complaints about performance YET, but if you ever need to resilver or you want to expand the pool, you will definitely have complaints about performance, but more importantly during a resilver the stress on the remaining drives will be 10x longer than if they were CMR drives, and the risks of you losing another drive and thus your entire pool during the resilver will be substantially higher.

But these are your drives, your data, your risk, and it is entirely up to you how much risk of losing your pool completely that you are prepared to take.

P.S. With 24TB usable space, most of the experienced people here would probably have gone for RAIDZ2. But you cannot change it once it is set, so like me - I made the same decision with my 5x 4TB CMR disks - you are stuck with RAIDZ1 and associated risks now (but bigger disks and more data and added SMR).

1 Like

Thanks for the information. I don’t really know what SMR vs CMR means. I’ll have to do some research to understand how to compare these technologies and what it means. From your context I gather that CMR > SMR in some ways.

At the time of my build I had 4 drives on a budget and double parity would have not yielded me enough usable capacity so I opted for single partity (with the associated risk). Luckily this is just a home lab and everything is backed up off-site nightly so i’m not overly worried about loss in the event of a 2 drive failure. It would suck… don’t get me wrong… but I would recover eventually.

I recently was gifted 6 of these CMR drives and decided to add them to my 6-bay NAS to max out my bays and expand capacity, but since i have enough of them I can go through and swap each of the SMR’s for the CMRs I now have. Of course i don’t want to try to replace a drive while the expansion is going… So i’ll have to wait a month at this point for it to finish i suppose.

Well the expansion may not take a month. It might be a bit less, it might error out part way through.

In another thread, someone’s new SMR drive errored out during expansion, and the choice was to resilver that drive first when it was only partly expanded or to clear the error and let it continue (possibly needing to do that several times). The consensus (for a pool with all CMRs except for the one SMR that had faulted) was to replace the SMR with a CMR, resilver it, and then let the expansion complete.

Your situation is a bit different because you have multiple SMR drives in the pool, and the expansion has not errored out. But I think you still have a choice:

  1. You let the expansion continue until it completes or errors and then replace it.
  2. You offline the new drive, replace and resilver and then let the expansion continue.

I have literally no idea which would be best.

This is a good topic.
I had the same issue, extending the pool with 4th disk crawling with 3-4MBps which is insanely slow. It would’ve taken 1 month to extend the zpool. Taking into consideration that the pool is consistent of only CMR drives (WD DC Ultrastar)

First it was like this after progressing the whole night.

  pool: silo                                                                                                                                            
 state: ONLINE                                                                                                                                          
  scan: resilvered 4.42T in 1 days 08:07:49 with 0 errors on Wed May  7 18:03:59 2025                                                                   
expand: expansion of raidz1-0 in progress since Wed May  7 18:44:55 2025                                                                                
        186G / 13.3T copied at 4.16M/s, 1.36% done, (copy is slow, no estimated time)                                                                   
config:                                                                                                                                                 
                                                                                                                                                        
        NAME                                      STATE     READ WRITE CKSUM                                                                            
        silo                                      ONLINE       0     0     0                                                                            
          raidz1-0                                ONLINE       0     0     0                                                                            
            1ee7c057-25d7-4522-8887-b72d36d0f61d  ONLINE       0     0     0                                                                            
            5c55be94-d27f-4f99-9bbf-8bf2005e2d15  ONLINE       0     0     0                                                                            
            280e6cc8-15ab-4f75-803b-59865ffafbd1  ONLINE       0     0     0                                                                            
            007f8070-c8ce-4d94-8c16-42bcd9f25dff  ONLINE       0     0     0                                                                            
                                                                                                                                                        
errors: No known data errors    

Then I read one of the comments and verified the cache values myself:

# for disk in /dev/sd?; do; hdparm -W $disk; done                                                                      
                                                                                                                                                        
/dev/sda:                             
 write-caching =  0 (off)             

/dev/sdb:                             
 write-caching =  0 (off)             

/dev/sdc:                             
 write-caching =  0 (off)             

/dev/sdd:                             
 write-caching =  0 (off)             

/dev/sde:                             
 write-caching =  1 (on)         

I enabled the write-caching for all data drives to see if it makes any difference:

# sudo hdparm -W 1 /dev/sdb                                                                                            

/dev/sdb:                             
 setting drive write-caching to 1 (on)                                      
 write-caching =  1 (on)              

Surely this seems to resolved the slow extending issue:

# zpool status silo 
  pool: silo
 state: ONLINE
  scan: resilvered 4.42T in 1 days 08:07:49 with 0 errors on Wed May  7 18:03:59 2025
expand: expansion of raidz1-0 in progress since Wed May  7 18:44:55 2025
        2.73T / 13.3T copied at 40.5M/s, 20.44% done, 3 days 04:25:38 to go
config:

        NAME                                      STATE     READ WRITE CKSUM
        silo                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            1ee7c057-25d7-4522-8887-b72d36d0f61d  ONLINE       0     0     0
            5c55be94-d27f-4f99-9bbf-8bf2005e2d15  ONLINE       0     0     0
            280e6cc8-15ab-4f75-803b-59865ffafbd1  ONLINE       0     0     0
            007f8070-c8ce-4d94-8c16-42bcd9f25dff  ONLINE       0     0     0

errors: No known data errors

It’s now at 40.5M/s and growing :slight_smile:

Thanks to all the comments in this topic!

1 Like