Migrating to temporarily smaller pool size/disks

Krebsy · April 9, 2024, 10:08am

Ok, so have asked this recently and need a plan/help.

I have a 12TB or thereabouts main pool.

4 drives (2TB each) in the main internal bays and ther rest split with other pools across a 6 drive Jbod enclosure.

The main pool has 6.11TB used.

I’ve moved everything over to a new server, giving me 4 more bays (each with a 3TB sas drive in Raid or 7.8TB capacity total.)

All apps and files etc. are on the old pool and I want to move the data to the new (smaller in total) drives, then export the old pool, rename the new copy so it takes over the role of the old pool, then remove the old pool drives, shove in more and extend the main pool back up to full size again.

Thus, I can free up the old drives and also move all main data from the JBOD to the primary server, clean down the JBOD and use that for data backups etc.

Struggling here with this.

I need a numpty guide on how to perform the tasks.

Taking a snapshot and using ZFS replication just totally fills the new disks and fails the job, so obviously its either the wrong tool or it’s moving data that i was unaware of.

Could it be snapshots? There are “Quite a few” related to the old main pool and would it be safe to clear those down?

Would this remedy matters or would ZFS still try to create the 12TB volume in a 7.8TB hole?

No real option to add more storage to the system bar moving 2 old pool drives to the JBOD. I have extra SAS drives but all the old drives are SATA so worried about mixing and matching…

I’m going to try an rsync and see if that just moves the filesystem only. I have the pool config (main server config) saved to re-import once the drive shennanigans are all sorted…

Cheers.

K.

Krebsy · April 9, 2024, 10:31am

Giving it a try by taking a fresh snapshot and setting up a replication task using the custom name I gave it…

Now just need to wait for about 8 hours or so to see if the pool overflows again… :).

Krebsy · April 9, 2024, 10:40am

ffs… Now the old pool is not showing any errors. (One of the reasons why I wanted to migrate was the “unhealthy pool” error.)

Half tempted to just leave it in there and give up, but I do want to migrate all the data to the new disks and stop having it spread all over the place. Plus the old pool disks will need to go back into the old R410 as I’ve got literally nowhere else to put them :).

Roll on this evening…

joeschmuck · April 9, 2024, 10:52am

The only thing you got was maybe a little more time to move your data to the new server. Don’t blow it off just because the error is now gone. Odds are it will come back and you will not want to rush into saving your data.

Yes, snapshots can eat up space, the more changes you make, the more it eats up. Think about it, you go and delete a 10GB file, but the file does not actually go away, it is still there taking up space in case you need to recover it. Maybe a clean up is needed, if that is causing your overflow issue.

Krebsy · April 9, 2024, 10:55am

Maybe so :). Still wanting to do the move as I’ll be happier with everything together in one place. Just frustrated at failing to get Just the data I need copied across… :).

I’m thinking of tidying up all the snapshots anyway…

Or saying “Stuff it” and finding a cheap 3Par enclosure on ebay to replace the Heath-Robinson setup I have (old pc case, mobo, psu, cpu, sas expansion card + 6 Sata drives…) :).

Krebsy · April 13, 2024, 11:24am

Right,

Totally confused now and in serious need of help.

I setup the new empty pool and decided to simply do a recursive CP copy from old to new via the command line.

cp -iprv from mnt/Storage1/all contents to /mnt/bog (the new pool).

Set it running (at about 1/4 the speed of zfs it seems) and this morning lo and behold, it’s gone over the used capacity size again.

I have no idea at all as to why it’s doing this???!?!?!?!?!

Is there possibly a self-referential directory link somewhere that is telling it to go back and copy the data again?

I have noticed that the Storage1 pool at the top level has all the Sub-folders as expected, but the “Windows-data” folder also has a copy/reference to them as well???

Not sure if this is a needless duplication, a link to a complete copy of the maain data or whether it’s something else…

The windows share (checking from Win10) says Windows-data is about 4.5TB in size.

That share is an SMB share connected to /mnt/Storage1/Windows-data/ so looks good there.

Not sure why I have this seemingly duplicate entry, or why even though the pool size is 6.x TB in size, even the basic copy operation is copying too much data???

ned some help on this one. I dont want to delete the sub folders in “Windows-data” as I’m concerned I’ll damage the data I want to keep, rather than any potential duplicate.

If it’s a self referentaial link, how do I identify it and safely break it?

Ta,

Stu.

Krebsy · April 13, 2024, 11:29am

I feel my only option at this point would be to put the drives back in their old server, fire that up, slap 8TB extra in the new box and try to port everything over a direct SAN using the secondary interfaces on the system.

Then hope it’s enough capacity. I’d need to wire the old HBA back into the old box.

Wasted about a week on this so far… :(.

Argh… :).

Stu.

Krebsy · April 13, 2024, 5:49pm

Right. So in the end I swapped out the old backup pool (6TB plus 2*3TB drives) and managed to squeeze in the old storage pool into the pc case that is the jury-rigged external array.

Added 4 drives back into the host (had to faff with them as it wouldn’t wipe them for some reason) but now I have a 12.5TB max size old pool and a 13.8 new one.

The first replication attempt seemed to say it was copying 1600+ snapshots, so stopped that and took a fresh one from the cli, then fired it over.

Let’s see if it’s done in the morning……

Stu.

Krebsy · April 14, 2024, 10:24am

So, with the full pool in place and the old one in the external array, it ran through and seems to have stopped at just over 7TB used. Still more than expected but less than the size of the pool I had setup before? Bear in mind I have left it running before and it et up all the 7.8TB I had previously assigned….

Confusing….

Need to check the server cli for issues and then do some file list comparisons to make sure nothing is missing……

Krebsy · April 14, 2024, 10:31am

Losing just shy of a TB in wasted space is manageable. I did notice the server was running a scrub job on the new pool during the data migration, so possibly that tidied up some stuff as it went along?

Going to export and re-import the pools to properly rename them, re-apply the server config then have to move the backup drives back into the array, the old storage to the old r410 and then strip everything down to get the gpu running… :).

etorix · April 14, 2024, 11:49am

Due to padding, the same amount of data will take increasingly more space on increasing raidz levels, and more than it takes on mirrors (incl. single drives).
“Space”, be that occupied space, available space or free space is a complex topic with ZFS.

Krebsy · April 14, 2024, 12:43pm

Looks it. Adding the extra vdev (4*2TB in raidz). Seems to have done the trick. I ran a quick diff -qr check for an hour or so and nothing went to the output file…

Looks like I need to just rename the pools etc and re-setup the backups…