Evenly distributing data on disks

koifish59 · November 17, 2024, 6:36pm

I currently have a pool of 10x 6TB raid-z2 that’s half full. I have another pool of 2 vdev mirrored using 4x 18TB, fully empty.

I want to snapshot the first pool and migrate that snapshot to the 2nd pool, then decommission the first pool.

If I added more vdevs with 2x 18TB mirrored drives each time down the road to expand, would data evenly redistribute to the new disks or are they still fully contained in the 4 first disks?

Would snapshots be the best way to migrate data? Would adding additional vdevs with 2 mirrored drives be the best choice for performance, risk reduction, and resilvering speed? I don’t mind the loss in capacity doing mirrored vs raidz.

dan · November 17, 2024, 6:40pm

No, and there’s no particular reason to desire this outcome. But if you do, there are some scripts floating around that would do it.

etorix · November 17, 2024, 7:26pm

Specifically:

Yes for performance (IOPS), flexibility and resilvering speed.
No for risk reduction: You can only lose one disk (per vdev), and any hiccup with the remaining disk before resilver is complete then puts data at risk.
For better security, especially with these very large drives, you’d need 3-way mirrors—at an obvious cost—, or raidz2—which you would then expand either by adding further raidz2 vdevs (min. 4 disks at a time) or through the new raidz vdev expansion (with all its caveats and gotchas).

probain · November 17, 2024, 7:36pm

Those rebalancing scripts are mainly done/made/designed from before block-cloning was a thing.
I don’t know if block-cloning is activated again by default. But if it is, the data might just be cloned instead of actually “distributed” over the disks…

koifish59 · November 17, 2024, 8:33pm

The reason I wanted even data “distribution” is if I lose the original first 2 vdevs, I won’t lose everything, whereas if it’s spread evenly out later, I will only lose some data.

Does it work like this?

dan · November 17, 2024, 8:36pm

No, it does not. Loss of any vdev is loss of the entire pool.

SmallBarky · November 17, 2024, 9:02pm

It’s hard to say what would be options without know what you are using the server for.

For a general idea of comparing different layouts, Calomel.org had a decent article that explains the basics with examples.
https://calomel.org/zfs_raid_speed_capacity.html

etorix · November 17, 2024, 9:20pm

Data is broken into chucks (“recordsize” at play here), and the chucks would be distributed across vdevs, so losing one vdev would mean that most large files would NOT be recoverable from the remaining vdevs even using Klennet ZFS Recovery.

koifish59 · November 17, 2024, 10:18pm

My first ZFS system was in 2015 in during the FreeNAS days, now 10yrs later I had forgotten all the basic nuances.

Thanks everyone for clarifying these n00b questions

Stux · November 18, 2024, 12:09am

Yes.

(Blah blah blah)

Stux · November 18, 2024, 12:11am

New data will be biased towards the emptiest vdevs, meaning as data is rewritten it’ll rebalance in time.

Of course data that is never rewritten will stay where it was written.

Or you can rebalance.

Or you could just restore from your backup

koifish59 · November 18, 2024, 12:49am

So this implies that if were to purge old files and snapshots, new files will still get written to the more empty vdevs. But would this still have the problem of pool fragmentation as seen with old pools filled more than 50%?

Stux · November 18, 2024, 3:42am

Is it a problem?

Fragmentation is per vdev…

# zpool list -v tank
NAME                                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank                                      54.4T  21.0T  33.4T        -         -    21%    38%  1.00x    ONLINE  /mnt
  mirror-0                                7.27T  2.55T  4.71T        -         -    16%  35.1%      -    ONLINE
    1b2a34e0-e57c-4a5b-9033-ca0fe03f51a3  7.28T      -      -        -         -      -      -      -    ONLINE
    26609e36-6148-48b2-9c12-309a385d17a6  7.28T      -      -        -         -      -      -      -    ONLINE
  mirror-1                                7.27T  2.38T  4.89T        -         -    19%  32.7%      -    ONLINE
    a167e02d-4382-4b04-9010-31036b3a29b5  7.28T      -      -        -         -      -      -      -    ONLINE
    37d7938e-d4dd-4e94-8005-7a4358c4ee17  7.28T      -      -        -         -      -      -      -    ONLINE
  mirror-2                                7.27T  2.33T  4.93T        -         -    19%  32.1%      -    ONLINE
    1c04a049-4bb4-49ef-910d-0177986ce9ee  7.28T      -      -        -         -      -      -      -    ONLINE
    3ec1932e-8ece-4b20-83bb-6ffb0accab4b  7.28T      -      -        -         -      -      -      -    ONLINE
  mirror-3                                7.27T  2.37T  4.89T        -         -    19%  32.6%      -    ONLINE
    a22e2891-1e80-4574-a402-a74eb39181be  7.28T      -      -        -         -      -      -      -    ONLINE
    f30fa3bc-a8a4-4b28-8467-1023f58a9303  7.28T      -      -        -         -      -      -      -    ONLINE
  mirror-4                                3.62T  2.33T  1.29T        -         -    39%  64.3%      -    ONLINE
    a2a37f56-b979-4658-bf02-6ec89f90d1fb  3.64T      -      -        -         -      -      -      -    ONLINE
    3c8ac3c7-ccd8-405c-899c-4cf137e698f5  3.64T      -      -        -         -      -      -      -    ONLINE
  mirror-5                                3.62T  2.41T  1.21T        -         -    35%  66.6%      -    ONLINE
    8b6c50d7-8227-4175-a41b-25bf3f8f49e7  3.64T      -      -        -         -      -      -      -    ONLINE
    bdc54896-4343-46b0-a66f-222aa5f536f5  3.64T      -      -        -         -      -      -      -    ONLINE
  mirror-6                                3.62T  2.32T  1.31T        -         -    36%  63.9%      -    ONLINE
    42b8b194-e804-4682-8d16-eeaa7b006243  3.64T      -      -        -         -      -      -      -    ONLINE
    606cee8f-da2c-49c4-9663-3aa3368808c9  3.64T      -      -        -         -      -      -      -    ONLINE
  mirror-7                                3.62T  2.43T  1.19T        -         -    33%  67.1%      -    ONLINE
    3dc6f4ef-78e3-4bd2-95fe-73ff17507f7c  3.64T      -      -        -         -      -      -      -    ONLINE
    c3e0ab70-6820-43c0-9037-a0a8d4ea3a70  3.64T      -      -        -         -      -      -      -    ONLINE
  mirror-8                                3.62T  1.85T  1.77T        -         -    27%  51.1%      -    ONLINE
    1f18dc41-5935-4059-823b-bf0874742deb  3.64T      -      -        -         -      -      -      -    ONLINE
    f6c62920-d767-4865-90e6-064bdb757a47  3.64T      -      -        -         -      -      -      -    ONLINE
  mirror-15                               3.62T  40.1G  3.59T        -         -     1%  1.08%      -    ONLINE
    2de3cf07-af92-4ba6-ba1d-b5da5f5316de  3.64T      -      -        -         -      -      -      -    ONLINE
    bf819af9-1504-4d59-bc4a-c65606c0f0fd  3.64T      -      -        -         -      -      -      -    ONLINE
  mirror-16                               3.62T  31.5G  3.59T        -         -     1%  0.84%      -    ONLINE
    017759b7-6b9b-47b5-a210-1a7919007bde  3.64T      -      -        -         -      -      -      -    ONLINE
    a92382bd-42b5-4729-bcfd-677bc6a6380f  3.64T      -      -        -         -      -      -      -    ONLINE
logs                                          -      -      -        -         -      -      -      -         -
  nvme0n1p1                               93.2G  5.40M  93.0G        -         -     0%  0.00%      -    ONLINE
spare                                         -      -      -        -         -      -      -      -         -
  26f09fad-8412-4892-a89b-4ed0ff6c1763    7.28T      -      -        -         -      -      -      -     AVAIL

I honestly think people spend too much time worrying about this stuff.

Notice my pool is in the process of growing via replacing 4T disks with 8T disks… The most fullest vdevs are the oldest vdevs… and they were the most fragmented… but now they have lots of free space… and are relatively less fragmented…

And notice, that all but the newest vdevs are fairly well balanced (fill wise), even though I’ve been adding vdevs over time.

And yes, I’m using mirrors, because raidz didn’t provide the performance I needed to saturate 10gbe.

If performance was that important, perhaps I’d be using SSDs these days.

koifish59 · November 18, 2024, 5:06am

Thank you for those concrete numbers! I just remember 10yrs ago the gurus made a big deal on squeezing every bit of performance and tuning out of ZFS.

I never bothered with ARC or L2ARC, I just throw more memory at the problem

Likewise, I’ll throw more mirrored vdevs if capacity is needed and not worry about even distribution or fragmentation.