Have you used BTRFS recently? And what's your opinion?

We talk about ZFS a lot here, but it does have some competition. For Linux, that can be BTRFS which was started by Oracle 2007 and released 2009.

For me, I had originally used 2 root partitions, (in some cases on the same disk), so I could avoid the dreaded Gentoo Linux failed upgrade. Back then, I tended to perform a full backup just before updating and have had to perform several full restores. Annoying.

So, I created a second root partition and made alternate boot environments out of them. With separate Grub entries for both. (This was inspired by Solaris’ alternate boot environments…) This worked very well, but ate up twice as much disk space for the root file system.

When BTRFS came out as reasonably stable, I gave that a try, maybe around 2011. It had quirks.

The most annoying for me was that a new, snapshot & clone was not allowed to be named in Grub. Had to use it’s ID. That meant I had to edit both the old BE and new BE’s Grub configuration file. But, that was fixed and then I was able to use the BE’s name in Grub. Simplified the creation of the alternate boot environment.

However, by the time 2014 rolled around, BTRFS still was not considered production ready. To my knowledge, I had not lost any data but then why wasn’t the Linux community jumping on the BTRFS bandwagon?

So, during a build of my new media server, I tried Gentoo Linux’s OpenZFS implementation. That went reasonably well, though recovery tended to be problematic. (Finding Linux boot image with recent OpenZFS…)

Eventually I converted all my computers over to OpenZFS. And when possible, Mirrored the root. Other than a few quirks during updates, everything was good.

One quirk with OpenZFS was that between releases, the API was not stable. Thus, you had to match the kernel with userland code, otherwise it would not work. Especially having to recreate the initial RAM disk for the new kernel modules. Again, eventually this was fixed.

So, have any of you used BTRFS recently?
If so, how was your experience?

Note, we don’t need to bash BTRFS here. Nor describe it’s short comings.

2 Likes

I only “use” BTRFS as the root (and home) filesystem on my Garuda (Arch-based) desktop.

On more than one occasion, I had to resort to doing a “rollback” after borking things by playing around with packages and updates (which was my own fault). The rollback, like with a ZFS snapshot, was instantaneous.

As for the btrfs commands? I find them to be less intuitive than ZFS. (Correct me if I’m wrong, but why does a BTRFS (sub)volume need to be mounted into order to set a property?) :woozy_face:

I also find its flexibility (and features) lacking compared to ZFS.

The reason I even use it is because of the tenuous support of using ZFS on Linux, for the reasons stated above by @Arwen.

I also do not see any reason to use it for a NAS or pure storage solution. Sure, for a desktop or root/home filesystem on a daily driver, which I am currently using. But not as a dedicated storage server.

I suppose the dark horse to keep an eye out for is bcachefs, which could become a formidable (native) alternative option for Linux to compete against ZFS.

I did recently when testing TrueNAS and another NAS solution to decide which way to jump. My experience with btrfs was not a happy one, though I am unclear whether it was the NAS middleware or the btrfs that was the key issue or the combo of the two, but ultimately I lost all data in such a way that even snapshots were lunched & totally unrecoverable. The decision after that was an easy one for me.

1 Like

I had forgotten about bcachefs. But, it has 15 more years to catch up with ZFS… (ZFS was started about 2000 and bcachefs was started about 2015.) Okay, that was unfair. Lots of new concepts that were unique or very new, can be implemented today with less code as well as fewer man hours.

But, checking Wikipedia, I see this for bcachefs in regards to stability;

From reading about bcachefs, it appears to be at least 2 years away. Perhaps even 5. What I mean is reasonable features, like: scrub; on-line FSCK; parity RAID, (RAID-5/6); backup, like ZFS Send / Receive.

1 Like

Color me unconvinced. It feels way too much like a one-man show (not that it is, strictly speaking).

4 Likes

Well, yeah, but seamless tiered storage, guys…

(Maybe the OpenZFS developers can steal that idea and implement it? It would be a feature that could see real-world use and great performance increases.)

BTRFS is supported by Synology NASes; anecdotally, I have one and it works. I is an off-the-shelf packaged solution, so I am not sure this is what you were interested in. Mostly what I want to say is “if you need to find a large group of BTRFS users, look for Synology NAS customers”.

1 Like

IIUC, Synology uses btrfs for its filesystem, but not as a volume manager–it uses LVM instead for that purpose. If that is correct, then I’m not sure Synology gives a very good picture of btrfs as it’s intended to be used.

1 Like

Yes, this is correct. The disks are partitioned, partitions are merged into MD-RAIDs then into an LVM pool, then volumes are carved from the LVM pool, then volumes are formatted with BTRFS. So BTRFS used in Synology is always single-disk (no use of native BTRFS RAID).

1 Like

Last I looked into BTRFS it sounded like it still had all the issues it’s always had.

Ie RAID5/6 is busted.

(For Winnie)

2 Likes

:fr: :wine_glass: :smoking:

I’ve used BTRFS without hitting ENOSPC errors or losing data, but see r/BTRFS for tales of woe. BTRFS is the default on some desktop distros - Sprial, Siduction, Fedora Workstation and OpenSUSE (home on XFS), and an install option for others. BTRFS on root doesn’t require jumping through hoops as does ZFS. A combination of grub-btrfs and snapper gives you something akin to boot environments. Note that BTRFS does not have an explicit “rollback” command. BTRFS is popular with desktop/laptop users who like to endlessly tinker and/or are using bleeding edge distros in which I’d include archlinux and its derivatives.

The sole BTRFS NAS project I know of is “Rockstor” which is currently based on OpenSUSE. BTRFS has its quirks and all multi-device profiles are stable, bar RADI5/6 which remain in a perpetual “work in progress, use at your own risk, not ready for production” state. There seems to have been no great changes since this 2020 post - How to use btrfs raid5 successfully(ish) - Zygo Blaxell - How to use btrfs raid5 successfully(ish) - Zygo Blaxell See here for some nitty-gritty user testing comparing ZFS RAIDZ1 to BTRFS RAID5 - Battle testing ZFS, Btrfs and mdadm+dm-integrity

The thing to keep in mind is that in BTRFS multi-device profiles, redundancy is at the “chunk” level not device level. So, for exmaple, a RAID1 profile simply means two copies of data & metadata on two different array devices. If you take a simple two device mirror and one disk drops out/fails, then things chug along as BTRFS writes new data in SINGLE mode and if you checked the filesystem device stats, errors pile up. BTRFS does not have any kind of daemon monitor, so there’s no built-in mechanism to alert you that a disk has failed. As BTRFS tends to turn ready-only when there are problems, on next boot the degarded simple mirror will not mount rw - so not much of a RAID then. After a physical disk replacement it requires user intervention to mount the BTRFS filesystem ro in order to do a BTRFS device replace, followed by a scrub. But you can be left with a mix of SINGLE and RAID1 mode data/metadata so a BTRFS balance is also required in this case. Hence you’re better off having the “minimum number of disk + one” when using a RADI1 profile. BTRFS is supposedly popular with homelabbers because of ease of expansion - just add another device and re-balance. Expansion sounds simple and it is, but don’t get caught out by thinking that adding a 10TB to a RAID1 array with two existing 4TB drives gives you a total of 50% of 18TB usable storage, because you’ve guessed it folks you only get to use a max of 8TB of your shiny new dirve after you re-balance. Before I forget, getting accurate figures on available space on BTRFS filesystem particularly when there are many snapshots is not straight forward.

That’s my BTRFS experience in brief. Others may disagree, but even with caveats, I find it hard to recommend using BTRFS for storage for more than say 3 or 4 drives using a RAID1 profile, assuming you can live with 50% space efficiency. I’ve not touched on performance as that’s another rabbit hole to go down.

6 Likes

I forget to mention that you have support for shadow copies in smb shares via the vfs_btrfs object. There is also server side copy support both in SMB/CIFS and NFSv4. This is both within and across subvols. Someone will correct me if I’m wrong, but IIRC zfs blockclone in Linux does not give server side copy across datasets, or locally.

I used a Synology with a early version of BTRFS and also rolled my own bcachefs Raspberry Pi Time Machine receptacle at one point. Both experiences were so awesome that neither system is being used here anymore.

I don’t think I lost any important files on the Synology thanks to backups but the mounting stories of BTRFS woes in Synology-land shook my confidence. I also recall that the Synology implementation didn’t use BTRFS for all aspects of the RAID experience since the RAID implementation in BTRFS was quite broken at the time. That may have been addressed since. But it gave the appearance of BTRFS as implemented by Synology to be a bit of a chewing gum and chicken-wire solution.

My biggest worry with Synology’s implementation was the prospect of silently propagating file errors into my backup streams, followed by not having any good copies left. That’s what drove me to FreeNAS / TrueNAS. Between scrubs, checksums, etc. I now have confidence that the data on the NAS is good for use as well as backup.

I have no idea why the combination of Pi + bcachefs was unstable as a Time Machine server. But a Pi can work really well in that capacity when using plain vanilla ExtFS. So I ditched bcachefs also. It’s a bit slower but Time Machine doesn’t try to break speed records, so it’s OK. I get about 50MB/s out of that rig when I back it up.

2 Likes

I think its worth repeating that AFAIK Synology has never used native BTRFS RAID profiles. They simply slapped BTRFS on top of their existing mdadm/LVM methods enabling them to offer scrubs and snapshots. They needed to have RAID5 which BTRFS couldn’t do reliably.

3 Likes

As I understand it, any userland application that calls copy_file_range() will automatically take advantage of block cloning, as applicable. I also seem to recall that Samba tries to use copy_file_range().

1 Like

The one exception is if you’re using encryption. In that case, block-cloning will only work when issuing a copy command within the dataset itself.

The hidden benefit of block-cloning (if it’s enabled :wink:) is that you can “selectively” restore individually deleted files from snapshots, without consuming additional space. (Sadly, it doesn’t work for encrypted datasets.)

2 Likes

Still off topic re: copy_file_range() There was a recent post by OpenZFS dev Rob Norris - robn - on r/zfs that said this:

copy_file_range() is “copy this file, and I don’t care how you do it”. This call can use clones to do this, but if cloning is not possible, instead of failing as FICLONERANGE would, a regular content copy is done instead. cp --reflink=auto uses this. It is available on FreeBSD and used by default by its builtin /bin/cp. On Linux it can clone across datasets, but may not if it decides to fall back. There’s no way to tell directly whether it used clones or not to service the request, because the API contract is "I don’t care how you do it. Reddit - Dive into anything

My emphasis.

robn had also said this abt a year ago:

Yep, this. OpenZFS mainline gained support for block cloning (the enabling tech for “reflinks”) and the necessary OS wiring (copy_file_range) for FreeBSD in March, while the wiring for Linux (copy_file_range, FICLONE, FICLONERANGE) landed just two days ago. Neither have been in a release yet; they will be in 2.2.0.

If you use cp and mv from GNU Coreutils 9.x, you’ll find that --reflink=auto will produce clones across datasets (with fallback to copy in some limited situations). --reflink=always will not work across datasets; this is a Linux limitation, not OpenZFS (if Linux ever lifts this restriction it will begin working immediately).

(Extra irony: --reflink=always will work across datasets if you use a pre-4.5 kernel).

2 Likes

Like you I stopped using it when I found I could use ZFS. At that time I was on unraid and was becoming frustrated with a choice of only BTRFS and XFS. BTRFS was the default for the cache and you could use XFS, but only BTRFS could be mirrored I think. Ultimately I became tired of repeated BTRFS data corruption (I stuck with it for maybe 2 years with 3 data loss events) then switched. I would love BTRFS to be a viable alternative, perhaps now it is, but the problem is trust is lost fast with a file system, so that would be my 2c to add to your question, what new changes have been put in place to make me want to try it again? Hard ask though, with ZFS as good as it is.

Interesting, I always assumed that since a big player like Synology had settled on BTRFS that they must have added some secret sauce to make it run well. But it sounds like they didn’t and you got what I suppose we could say was a not entirely uncommon experience. What a shame. I hear a lot of good things about synology, but based on it running BTRFS I have avoided it so far. I half wonder whether the only way forward for BTRFS would be to have a fork made and give it a new name as it’s reputation is a bit less than ideal and that’s hard to get away from.