From last week’s episode of TrueNAS Tech Talk, the subject of “ZFS rewrite” came up.
Apparently, this is a zfs
subcommand to read and write existing blocks of data, while bypassing some userspace overhead. It does not suffer from the “double allocation problem” associated with conventional file copying in the presence of snapshots. Nor does it, supposedly, modify any filesystem metadata, such as timestamps and filenames.
Sounds great, right?
Did you change the dataset’s recordsize and want to retroactively apply it to existing files? Now you can.
Did you change the dataset’s compression and want to retroactively apply it to existing files? Now you can.
Did you add a new vdev to your pool to increase its total capacity, and now you wish to rebalance your existing data across all vdevs? Now you can.
Did you expand your RAIDZ vdev, and now you wish to rebalance the existing data and use the more “efficient” allocation (and more accurate space calculation)? Now you can.
I’m not quite sold yet because I really don’t know what is happening. I tried to read through the PR on OpenZFS GitHub, but the technical jargon flew over my head.
I’ll ask my questions like I’m 5 years old, if anyone would be so kind to answer them like I’m 5.
@mav, since you wrote the code, I’d very much appreciate if you are able to give “user-friendly” explanations.
I apologize if these were already answered on GitHub. I tried my best to read through the entire thread, but got lost in some of the technical stuff.
- What happens if you lose power or the system crashes in the middle of using
zfs rewrite
? - Are the data blocks that comprise the file the only things being read and rewritten? No metadata blocks are being touched?
- If you have a 1-MiB uncompressible file currently saved under a dataset with
recordsize=128K
, it is comprised of 8 blocks. If you change the dataset torecordsize=1M
and then runzfs rewrite
on the file, will it now be comprised of a single 1-MiB block? If so, doesn’t this mean thatzfs rewrite
in a sense “violates” a rule of ZFS? What happens with an existing snapshot that is referring to the 8 blocks of data? - Similar to point 3, what happens if you go in the opposite direction? You change the
recordsize
from 1M to 128K. A snapshot referring to a single block (with a unique pointer) must now point to 8 different blocks with 8 different pointers?
This makes me feel uneasy, since mucking around with existing blocks of data that are being referenced by snapshots seems risky, and it could introduce unpredictable bugs in the future.
I might not be interpreting this correctly, and could be mistaken about what zfs rewrite
actually does to existing blocks and how it affects current snapshots.