Mirroring existing 1-SSD pool with slightly smaller drive

StrandedCamel · July 27, 2024, 11:30am

I’ve got a non-boot pool made up of one 2 TB SSD that I’ve been using for some years now, and I want to mirror it now. I followed the instructions here, and attempted to create a mirror with another 2 TB SSD, but it threw an error – it said the new drive is too small.

Upon further inspection, it turns out the “2 TB” of the first drive is actually 1.86 TB, while the “2 TB” of the second one is 1.82 TB. So the new, would-be mirror drive is indeed very slightly smaller than the existing drive. (As an aside: SSD manufacturers really need to come clean about what they’re selling!).

I’m currently only using half the capacity of the 1.86 TB SSD, so it would seem that the best solution here would be to shrink the vdev ever so slightly, from 1.86 to 1.82 TB, and then create the mirror. This is really a vital thing to be able to do, as when shopping for a potential replacement 2 TB SSD drive, most don’t tell consumers their exact capacity, and so I could go through any number of 2 TB drives I won’t be able to use with TrueNAS if the partition size can’t be shrunk a bit.

Can this be done? If so, how?

It may be relevant that the in-use SSD was created ages ago, and uses GELI encryption.

pmh · July 27, 2024, 7:04pm

You cannot shrink a vdev. Create a new pool on the smaller drive, copy the data, destroy the old pool and add the larger drive to create a mirror.

StrandedCamel · July 27, 2024, 11:47pm

Thanks for the answer. What’s the best way to copy the data? My instincts say rsync, but that somehow feels very Linuxy.

Stux · July 28, 2024, 12:14am

With stripes and mirrors you can add and remove vdevs though.

Add your 2nd disk as a VDev. Then remove (in the gui) the first. That should evacuate the contents of the first VDev onto the second VDev.

Then you can “extend” the second disk with the now removed first disk.

And you should now have a mirror.

Ps: I have not personally tried this, but in theory it should work.

Stux · July 28, 2024, 12:17am

The best way to copy data pool to pool is to use zfs replication.

But it’s probably best to just avoid it by adding/removing vdevs.

pmh · July 28, 2024, 7:00am

A little bit more risky, but clever thinking indeed

StrandedCamel · July 30, 2024, 11:58am

I’ve tried to see what you mean in your first sentence, but I’m not sure I’m getting it right. In Storage, I hit the cogwheel of my existing pool, select Add Vdevs, and I’m presented with the option of adding my spare SSD to the pool… but only as a part of a striped vdev. This throws the following dire warning, in red:

Caution: A stripe data vdev is highly discouraged and will result in data loss if it fails

1. Is this actually what you suggested I do, before removing the first disk’s vdev?

2. Also, once I’ve removed the first disk’s vdev and its contents begin to be transferred to the second disk’s vdev, how do I know when the process is complete? This process seems to be resilvering in all but name, but @pmh 's comment makes me think it’s a bit hacky and therefore might not show the normal resilvering progress messages.

3. Do I need to shut down the jails and VMs that are on the first vdev, as well as avoid writing data to it, while this data transfer process is in place?

4. My current vdev is GELI-encrypted and has not just data on it, but also a VM on a ZVol. Will any of this cause problems with your method?

SmallBarky · July 30, 2024, 12:04pm

I think that is what he is talking about. adding it as a stripe. Then through the GUI you tell it to remove the first VDEV and it should move the data to that second VDEV. After that completes, you add that first disk to the VDEV but as a Mirror

I just saw your edits. I suggest waiting for Stux to reply.

Stux · July 30, 2024, 11:50pm

Yes. This is true, but you already have a stripe vdev right? We could argue about what is safer, replicating your data to another pool… then having only a single copy while building the mirror… or evacuating your data to another vdev… then only having a single copy while building the mirror.

The warning is scary because its trying to encourage you not to do what you’ve already done, which is setup a non-redundant pool.

1. Is this actually what you suggested I do, before removing the first disk’s vdev?

Yes.

2. Also, once I’ve removed the first disk’s vdev and its contents begin to be transferred to the second disk’s vdev, how do I know when the process is complete? This process seems to be resilvering in all but name, but @pmh 's comment makes me think it’s a bit hacky and therefore might not show the normal resilvering progress messages.

Its a clever way to use an existing feature to do what you want. ZFS Device Removal was first promulgated about 10 years ago.

3. Do I need to shut down the jails and VMs that are on the first vdev, as well as avoid writing data to it, while this data transfer process is in place?

No. The device removal proceeds while the pool is online. This is a benefit of doing it this way. The con is that you will have a removed device table in memory until all blocks that were originally on the now removed device are eventually rewritten as part of normal pool churn. Of course, that may never happen.

4. My current vdev is GELI-encrypted and has not just data on it, but also a VM on a ZVol. Will any of this cause problems with your method?

I’m sorry I didn’t notice the GELI detail upfront, in theory it shouldn’t matter, but I have no idea what the actual effect of GELI will be. Its possible the new vdev won’t be encrypted… and the old one will be removed. or not.

I expect it will work if you do everything by the GUI, but I don’t know, and I have never used GELI… which is also deprecated… and you will want to remove it before migrating to SCALE as well.

My understanding is that GELI is whole-device encryption which is applied at a layer below ZFS, thus replacing a GELI encrypted disk with a non-encrypted device would mean the new disk is not encrypted.

dan · July 30, 2024, 11:59pm

Not Stux, but yes.

Monitor it in the pool status page, or in the Tasks list in the GUI.

Nope; the process should be transparent to applications.

That’s probably a problem.

SmallBarky · July 31, 2024, 12:31am

Probably time for you to hit the documentation on encryption and start looking at migration plan, now or later.

StrandedCamel · July 31, 2024, 8:35am

Thanks for that amazing response! I’m going to see about upgrading from the old GELI encryption first (this server has served me incredibly well, since FreeNAS 11.something, so I’ve had little desire to mess with it), and then I may well use the process you outlined here.

StrandedCamel · July 31, 2024, 8:36am

Thanks a ton for your answers. Especially the warning about upgrading from GELI encryption. That’s going to be my next step.

StrandedCamel · August 29, 2024, 3:55am

Sorry to necro this thread. I decided the safest route was to order a new SSD that was the exact same size as the original one, hence the delay. But now that it’s arrived, it turns out that this third SSD is also ever so slightly smaller than the original, even though it’s from the same manufacturer and is sold as being 2 TB, too. Gads!

Couldn’t we just have re-sizable VDEVs finally?

Anyway, I then tried @Stux’s idea of adding the new disk as a VDEV (see post above). And unfortunately, it errored out!

Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 355, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 391, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 868, in do_update
    await self.middleware.call('disk.geli_passphrase', pool, None)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1251, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/disk_/encryption_freebsd.py", line 185, in geli_passphrase
    self.geli_setkey(dev, pool['encryptkey_path'], GELI_KEY_SLOT, passphrase)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/disk_/encryption_freebsd.py", line 199, in geli_setkey
    raise CallError(f'Unable to set passphrase on {dev}: {cp.stderr.decode()}')
middlewared.service_exception.CallError: [EFAULT] Unable to set passphrase on gptid/[REDACTED].eli: geli: Cannot read metadata from gptid/[REDACTED].eli: Invalid argument.

I can sort of understand why the “Unable to set passphrase” error exists – the drive is GELI encrypted, and I guess TrueNAS hasn’t kept as much legacy support as advertised.

I can’t understand what could be causing the “Cannot read metadata” error.

Regardless, what can I do now? I have the original, GELI-encrypted SSD as a single-device VDEV, plus two other SSDs that are about 40 MB smaller than it. And I simply want to make a mirror of the original.

etorix · August 29, 2024, 5:26pm

Back to some of the early answers:
Make a new pool, and replicate your data to it—removing GELI encryption in the process.

StrandedCamel · August 30, 2024, 1:45am

Thanks. I just went to do this, and it turns out I can’t create a new VDEV. I get the following error:

[EFAULT] Failed to wipe disk ada1: [Errno 1] Operation not permitted: '/dev/ada1'

Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 355, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 391, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 653, in do_create
    await self.middleware.call('pool.format_disks', job, disks)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1240, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/format_disks.py", line 28, in format_disks
    await asyncio_map(format_disk, disks.items(), limit=16)
  File "/usr/local/lib/python3.9/site-packages/middlewared/utils/asyncio_.py", line 16, in asyncio_map
    return await asyncio.gather(*futures)
  File "/usr/local/lib/python3.9/site-packages/middlewared/utils/asyncio_.py", line 13, in func
    return await real_func(arg)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/format_disks.py", line 24, in format_disk
    await self.middleware.call('disk.format', disk, swapgb if config['create_swap'] else 0, False)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1251, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/disk_/format.py", line 21, in format
    raise CallError(f'Failed to wipe disk {disk}: {job.error}')
middlewared.service_exception.CallError: [EFAULT] Failed to wipe disk ada1: [Errno 1] Operation not permitted: '/dev/ada1'

After no small amount of Binging, I ran sysctl kern.geom.debugflags=16 from root and tried to create the new VDEV again. Now it gives me a different error:

[EFAULT] Unable to GPT format the disk "ada1": gpart: geom 'ada1': File exists

Error: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 355, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 391, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 653, in do_create
    await self.middleware.call('pool.format_disks', job, disks)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1240, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/format_disks.py", line 28, in format_disks
    await asyncio_map(format_disk, disks.items(), limit=16)
  File "/usr/local/lib/python3.9/site-packages/middlewared/utils/asyncio_.py", line 16, in asyncio_map
    return await asyncio.gather(*futures)
  File "/usr/local/lib/python3.9/site-packages/middlewared/utils/asyncio_.py", line 13, in func
    return await real_func(arg)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/format_disks.py", line 24, in format_disk
    await self.middleware.call('disk.format', disk, swapgb if config['create_swap'] else 0, False)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1283, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1251, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1156, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/disk_/format.py", line 46, in format
    raise CallError(f'Unable to GPT format the disk "{disk}": {cp.stderr}')
middlewared.service_exception.CallError: [EFAULT] Unable to GPT format the disk "ada1": gpart: geom 'ada1': File exists

I then proceeded to run dd if=/dev/zero bs=1M count=100 of=/dev/ada1 and then try to create a VDEV with ada1. I got the same error as above.

I have no idea how to proceed from here.

StrandedCamel · September 5, 2024, 4:43am

Anyone?