[Accepted] Create 2 GiB buffer space when adding a disk

winnielinnie · February 16, 2025, 7:16pm

The more I think about it, the more it upsets me.

Storage wants to use KB, MB, GB, and TB? Power of 10. Fine.

Yet this same storage is formatted with sector sizes of either 512 or 4096 bytes.

STORAGE MANUFACTURERS: WHY NOT MAKE THE SECTOR SIZES 500 and 4000 BYTES?

What a bunch of weasels…

EDIT:

Before someone "corrects" me about this...

That’s the joke. Storage media’s smallest write unit must adhere to binary measurements. This means that the drive manufacturers know it only makes sense to size their devices in binary measurements, yet they still choose to intentionally use decimal measurements for blatant marketing.

yorick · February 16, 2025, 7:22pm

ChatGPTs impression of the Seagate marketing department back in the day.

The one in the foreground will be in a ton of pain and will require hip surgery … great work, AI.

joeschmuck · February 16, 2025, 8:08pm

What I dislike is when someone uses “Class” to describe something that falls short of the real thing. For example: 65" Class TV (actual size is 64.12"), or 7200 RPM “Class” Hard Drive (rotational speed could be 5400 actual), 12TB “Class” Hard Drive (actual capacity 11.1 TB). The marketing world found another way to short change the consumer. Will it ever end.

Don’t kid around like that, someone may hear you.

etorix · February 16, 2025, 9:14pm

This is very unfair to mustelidae… (And we have a very helpful resident one here!)
Now, do you have an insight into WD marketing department when it decided first to sneak SMR drives into the Red line and then that “rpm” are not a technical parameter but just your impression?

(The prompt you used should be a good start, substituting “weasels” by “worshippers of Nyarlathotep, the Crawler in Chaos”.)

yorick · February 16, 2025, 9:57pm

That is … quite specific. This is how ChatGPT imagines that.

Captain_Morgan · February 16, 2025, 11:41pm

This is the biggest issue. We have three groups of users:

Old vdevs with 2GB partitions
Newer vdevs without 2 GB partitions.
Brand new vdevs… partitions are TBD

I think any software change has to be OK with all of them.

(or @winnielinnie has to pay me much more than $30)

winnielinnie · February 16, 2025, 11:48pm

@yorick improved the request, where it’s more flexible than “always use 2GB buffer”.

He explains it in his “edit” in the opening post. I illustrate it here.

Unfortunately, those who created pools/vdevs with versions of SCALE, after the change was made, are up a creek without a paddle.

You drive a hard bargain.

Fine. I’ll up my offer: A handsake, two emojis of your choice, and a weekly “thank you” message emailed to your inbox.

I don’t know how I can offer more than this.

Foxtrot314 · February 16, 2025, 11:50pm

It’s unfortunate ZFS can’t shrink pools. Would be easy solution then.

github.com/openzfs/zfs

[Feature suggestion] Able to decrease the vdev capacity (shrink vdev size / shrink pool)

opened 04:11PM - 03 Jul 17 UTC

mailinglists35

Type: Feature

### System information Type | Version/Name …--- | --- Distribution Name | - Distribution Version | - Linux Kernel | - Architecture | - ZFS Version | - SPL Version | - ### Describe the problem you're observing A vdev is configured to use all block device capacity. I need to reclaim some of the block device space to use with something else than zfs. I don't want to destroy and recreate the pool and restore from backup. Even if I don't have any high hopes, it would be cute if I could issue a `zpool` command to shrink the vdev by cutting space from either beginning or end. I'm pretty sure many other people wish for this, so I'm filing this feature request issue which should belong to openzfs upstream repo. Wondering if work on https://github.com/openzfs/openzfs/pull/251 could be used to build this feature. ### Describe how to reproduce the problem ### Include any warning/errors/backtraces from the system logs

winnielinnie · February 16, 2025, 11:52pm

Ticket opened on Jul 3, 2017.

Yeah, it’s not happening in our lifetime.

Foxtrot314 · February 16, 2025, 11:53pm

Well, if iX decided it’s needed they could just throw engineer on that. They are one of ZFS contributors
But I guess the original buffer solution is cheaper.

yorick · February 17, 2025, 12:16am

And it absolutely would be.

New vdev: Create partition 2 GiB smaller than max

Replacement drive or added drive on existing vdev: Ditto, but if that is then smaller than the smallest member, then increase partition size to match, if possible.

Old vdev with 2 GiB buffer (“swap”) partition: This works. Replacement drive may not have a buffer partition, but has a buffer.

Old vdev without buffer: No change to now. Slightly smaller drives don’t fit. Nothing to be done about that; the UX damage created by the change to remove the buffer can’t just be un-created.

New vdev: Works. ZFS partitions are created 2 GiB smaller, creating a buffer.

Any kind of vdev and adding a larger drive: Works. The larger drive gets a partition with a 2 GiB buffer; gradual replacement and vdev expansion that way continues to work.

It’s a pretty small change to the existing logic and it absolutely works with any vdev ever created in FreeNAS or TrueNAS, any edition.

yorick · February 17, 2025, 12:33am

What’s the etiquette around evolving feature requests? This started as “buffer partition” and then with discussion evolved to “buffer space”.

I’ve left the original subject line and post and added an Edit: because people may have voted for the original. While the refined design is the same in spirit, it’s not the same.

On the other hand I can see the argument to accurately reflect what the ask is - and it’s not to create a buffer partition, it’s to create a buffer.

Change subject line and original ask, or nay?

Stux · February 17, 2025, 12:35am

Well, people can also choose to remove their votes as a result of the discussion evolving. Can’t imagine that happening much.

I think it’d be better to have the subject be accurate

yorick · February 17, 2025, 12:43am

Done. Also created explicit user stories for the types of existing or new vdevs and use cases I can think of.

What would you change further?

winnielinnie · February 17, 2025, 3:23am

I think it’s perfect and it gets the message across.

It’s an important feature request with little cost (boo hoo, you “lose” 2GB from your 12TB drive), yet spares users from surprises in the future when they need to replace a drive or expand a vdev.

A rudimentary form of this already existed with FreeNAS, Core, and early versions of SCALE.

yorick · February 17, 2025, 7:38am

I looked it up on my existing vdevs: TrueNAS CORE did 4194304 512 byte sectors, exactly 2 GiB; TrueNAS SCALE did 4194305 sectors, 2 GiB plus 512 bytes.

Adjusted the feature request to ask for 2 GiB and reference 2 GiB legacy “swap” partitions.

yorick · February 17, 2025, 9:52am

God yes. I want to suggest a minor adjustment, something easy that takes minimal engineering and testing time (especially since a version of this feature existed until 24.04.1).

Not a raidz-expansion-sized feature change to ZFS.

Foxtrot314 · February 17, 2025, 11:05am

But its quite sad. I created my pools on Scale, so now I dont have buffer and I will be forever condemned to live in fear that my eventual replacement disk will be 1MB too small and will fail.

yorick · February 17, 2025, 11:18am

Yeah in your case there are no extremely good options. Put a sticky note somewhere to buy “one size up” when a drive fails, I guess .

If you replace 12TB with 14TB you’re golden, no matter what.

Once all vdev members have been replaced with larger drives, the new capacity becomes available.

You could also, though this ain’t fun:

Move all data off the pool and verify it’s fine
Destroy the pool
Boot into 23.10 from temp media
Create a fresh pool (this one will have swap partitions, your buffer partitions)
Export the pool
Remove temp media and boot back into production SCALE
Import the pool
Restore data to it
Depending on your backup/restore process, recreate permissions and ACLs … better though if you chose something that retains ZFS metadata including dataset properties, ACLs, etc. Such as a ZFS send/receive

winnielinnie · February 17, 2025, 2:47pm

You could also, and this is fun:

Build a time machine
Travel back to May 2024
Spill coffee on the iX dev’s keyboard before they can commit the change