Is ZFS prone to this kind of corruption? I got the impression it was supposed to be pretty stable, but with all the small things others says can mess up the whole pool, it seems pretty finicky? I have never had hardware-raid loose the whole array by something like that.
Could you elaborate on the full setup you are using? I can assume it’s all directly connected disks. But you’re connecting how many disks total?
I would say cost is relative to the buyer, market, etc. as I see 24 disk SAS2 backplanes for what I would personally consider very cheap.
It is 24 sata disks connected with sata-cables via an areca card.
Any backplanes in particular to suggest? To fit in the case it being 5.25 is an upside.
BTW, is there a lot of non-genuine LSI-cards on ebay? That would be a downside with that solution.
That’s one way to look at it.
A counter would be that if your first choice when looking for hardware is to look at products clearly directly from China, you would probably benefit from revising your methodology.
Looking for used enterprise equipment that’s resold locally would be my first choice. There are plenty of sellers offering such items at good prices in North America and in Europe. Either way, you will need to do your own research as to make sure you buy a card that can be used in IT mode.
ZFS isn’t particularly prone to corruption, rather, quite the opposite. But that does assume you are following the guidelines. Many people choose ZFS because of it’s data integrity features. To me it seems ill-advised to pick ZFS and then go with gear that undermines it. One could make a similar argument about the usage of non-ECC RAM.
The advise from the others in this thread is sound, they are users with many years of experience.
But it’s not their job to convince you. Ultimately it’s your data and your choice.
Well, my first choice is to use the hardware i already have :). Getting from other places than chine usually carries a pretty hefty extra when it comes to postage, which further increases costs. Still might be the better choice, but makes keeping what i have more tempting.
Good to hear, it just started to sound like it when it was as if anything not going exactly to plan corrupted a while pool. I am not saying the advice is bad. But to be honest it does seem to be based more in “we don’t know the quality” rather then “we have experienced the quality to be bad”. Which of course also can be a reasonable approach. And while it certainly can be the best choice, I feel that almost whatever you ask for help with online, a lot of people is always going to suggest “buy new equipment instead of using what you have”.
BTW, do you happen to know of this “elevator seeking” is the same as TCQ/NCQ? It sounds very similar to me. (and a bit weird if ZFS doesn’t handle it, doesn’t all modern drives use it as a default?)
As I said, the choice is yours. I think it’s fair to say that the replies here have been made to help you make an informed decision on how to proceed. Like you, I also give value to using gear one already has, but that doesn’t change the fact that some gear just isn’t suitable for every use case and ignoring conventional wisdom can lead to undefined behaviour. If your data isn’t important or you have backups and like to tinker, go for it. Any success or mishap will be yours to reap, or suffer.
New server hardware is pricy, I advocate reusing old stock dumped on the used market. If your user name is any indication, you might be able to pick up good deals on Ebay from resellers in Germany, as an example; 3008/9300 controllers come in many different forms. Perhaps you can find a bargain?
I don’t know much about command queuing, but the literal Wikipedia article on it does say that it can be referred to as elevator seeking. I think the article also makes a good case why it’s not the right solution for every use case - it’s a bit of a black box.
It appears SATA NCQ actually implements out of order writes, breaking ZFS desired in order for writes:
That said, it may implement write barrier commands properly. Don’t know.
There may be a way to test out of order, (aka elevator seeks), writes. But I don’t know it. However, it is a requirement to have in order writes for ZFS to have “perfect” data integrity.
ZFS is no different from other file systems when used with less than ideal hardware, corruption can occur. HOWEVER, many, MANY times ZFS will DETECT that corruption while another file system will simply serve up bad data. Or fail a check file system next reboot. Thus, ZFS appears to have more problems, when the truth could be as simple as not knowing about other file system’s corruption.
Also, ZFS was designed for Enterprise hardware, like no RAID and ECC memory. That said, lots of non-Enterprise computers can use ZFS without any issue.
Modern hard drive almost certainly support elevator seeking. On the other hand they also support write barriers, which require the prior write to complete before going on to the next queued up write.
The issue is that the hardware RAID controller firmware may not honor write barriers, because it thinks battery backed up RAM is good enough. Or the firmware was simply not coded for write barrier support. HBAs use much more straight forward firmware, passing through those things to the drives, not processing it themselves.
Here is a quick reference to write barriers:
I am not trying to stop you from using your Areca HW RAID cards in “pass through mode”. Just putting forth that their is risk in the unknown, and if you have problems beyond simple disk failure, that may be the cause.
Harddrives has used NCQ / TCQ for what have to be decades now, and they have their own cache as well. If ZFS can’t handle either of those that will highly disturb performance one would assume?
I understand corruption can occur with any system, but isn’t almost the whole point with ZFS that it is more resilient to corruption? If it only handles ideal hardware, and doesn’t handle extremely common implementations in disk well, than it seems like it is in the practical world less resilient. What is sounds like in this thread is that it only works in ideal conditions and there is no way to check if the hardware does what ZFS expects it to do. Hardware raid cards is also designed for the enterprise, it still isn’t easy to corrupt the whole array, which is different than some data loss.
I must admit this whole thread gives me concern about the overall resilience of ZFS in the real world.
My comment was in direct reference to HW RAID controller cards.
Their was a time when hard drives did not flush their write cache properly. Like 20 years ago, and the recommendation was to disable the write cache to prevent corruption. But, that also applies to other file systems just the same. I would guess that is no longer a problem today.
Basically, you kept asking question that took us into the weeds, low level details that I perhaps am not the best person to describe. So I will quit now.
If you can’t get the answers you seek about ZFS, either from here or reading elsewhere, then it’s perfectly fine not to use ZFS. Lots of people think ZFS is over hyped, too complicated or just not needed for their use case.