Is it my imagination or are we seeing a LOT more problems with ZFS reporting Read, Write, Chksum errors?
I feel like I am seeing them every day, and some days multiple postings. Of course they all scream “Drive Failure” before actually checking into it.
Is this due to TrueNAS gaining in popularity or people building a sub-standard system, both? I could see it also being due to people just ignoring a problem until it becomes a BIG problem.
I just wanted to know if I’m going crazy. Have reports of ZFS Errors been going up, or now that I’m retired and I can spend more time on the forum, I just now noticed it.
When the userbase grows, everything else grows with it, including bug reports, errors, and complaints.
If not for Free/TrueNAS, ZFS for home users would not be as popular and widely used as it is today. Home users are more likely to incorporate shortcuts and subpar hardware.
I too feel that the ZFS error reports here in the forums are going up.
But, I think it is a combination of 3 things:
User error
Sub-standard hardware
And last, increased TrueNAS adoption
Some of the new reports lack so much detail, we only find later that they used SMR disks. Or hardware RAID controllers in JBOD mode. Or using non-server hardware as a server, without adequate cooling.
Oh, we are also starting to see people use TrueNAS as a VM. But, without the previously clear configuration we had for VMWare, some solutions are not a good choice.
For example, Proxmox seems to be the one that generates both many questions and problems. Part of it seems to be that VMWare did not support ZFS, however Proxmox DOES natively support ZFS. Thus, the need to black list the TrueNAS devices so that Proxmox does not touch them.
I’d put this one under “User error”. People think virtualization is the way to go. But, clear understanding of both the hypervisor AND client is REQUIRED for reliable operation when using a client that needs direct access to devices. (Storage devices in TrueNAS case.)
iX has said in a bunch of blog posts that adoption rates and installations have never been higher. I think @Captain_Morgan has posted metrics somewhere.
So given that I would concur with @Arwen but I think its mostly the last bullet point because user error and sub-standard hardware have always been a thing we’ve seen in the forums.
Anecdotally (IIRC), the last few ZFS Errors I’ve encountered/engaged on in the forums were some reallly old TN 11.x systems that had drives with 60k+ spindle hours (and multiple drives with errors to boot)
I haven’t seen any trends that have been alarming @joeschmuck if thats why you are asking.
I was saddened by VMWare and the ZFS thing, and the free ESXi no longer viable to new users. I have yet to try Proxmox but why mess with something that is working. But I may put together a third smaller system, just to give Proxmox a trial run so I can maybe understand what someone is talking about.
Of course when I started this thread, I was not insinuating there was any fault of the TrueNAS software and I’m glad no one went down that path.
Hey, on the other side of the coin, I got my first ZFS Cksum error on my spinning rust drive with 52172 hours on it. Of course it wasn’t the drive, it was self-induced and I’m glad it popped up when it did as it helped me with a script I’m working on. Scrub passed, will clear the error in a few days.
So, dang, I blame it on the Super Moon on 15 November, the timing seems about right
Agreed. But we unfortunately see people also doing things without understanding what they are doing.
I recall a day when this project was called FreeNAS and iXsystems said that the end users would need to be someone who has some basic knowledge on how to use FreeBSD, basic command line tasks. Somehow that philosophy changed to “Anyone can do it” likely after the version that will not me named, when clearly there are people who would rather not read a User Guide and “just wing it”, or worse, follow the advice of an AI.
I must be one of the few select lucky unicorns. I haven’t really had any ZFS errors issues besides having to replace 1 faulty drive in the last 11 years of using Free/TrueNAS.
Of course, I also don’t run big arrays of drives. Just 6 drives at most.
Hmmmm…
In cases where we have knowledgeable users with these reports, do we have a sense of the ratio of Core to Scale occurrences? Of course, any such analysis would be skewed by the “global”
Core to Scale install ratio…
I do notice so many first reports that don’t give the TrueNAS version. I always suspect that’s from a Scale user, Core users likely being aware of the potential for confusion.