Damaged disks in a pool and the integrity of the data stored on them.

Protopia · August 23, 2024, 4:47pm

NOTE: This analysis is based on my (limited) understanding of how ZFS works - and it is possible that I may have got some of it wrong.

This is still over-simplified:

Yes, both SMB writes from Macs and NFS writes from any source are synchronous requests by default. Only SMB from Windows is asynchronous by default. iSCSI writes are also synchronous and by default so are writes from VMs and Apps.
Regardless of whether you have an SLOG, synchronous writes have an overhead as they happen in two stages - first a write to a Zero Intent Log (ZIL) and later an asynchronous destaging of the data from the ZIL to the main pool. Asynchronous writes, stored in RAM until they are written to disk, will always take fewer resources.
However, you can mark a Dataset as asynchronous, overriding a synchronous request, and e.g. for Linux clients, you can mount e.g. an NFS share with an async flag too.
In the end, what matters is whether the synchronicity is important for data integrity for the specific use cases you have. If you are running a database, where each write is a transaction and you have to know it is committed before e.g. you confirm back to the user, then synchronicity is essential. I suspect it is also pretty essential for zVols used for iSCSI or VMs.

What is less clear is whether it is necessary for SMB from Macs or NFS writes. I might argue that the only point at which you need to have the write confirmed is at the point that e.g. a file save/copy/move completes. You are probably not concerned with whether the individual blocks are confirmed as written. Unfortunately I am not clear whether ZFS/SMB/NFS distinguishes between these two types of write.

So, IMO you need to evaluate for yourself for your own use case(s) whether the risk of the NAS crashing soon after you have saved/copied/moved a file outweighs what might otherwise be a significant performance decrease whilst you wait for each block to be individually committed to disk. If you decide that you cannot afford the risk, then an SLOG can be beneficial because a ZIL write to NVMe or SSD is going to be faster than a ZIL write to RAIDZ HDD.
If the risks of data loss on a crash are this important, then you should probably use a mirror SLOG because a sudden SLOG disk failure between ZIL write completion and it being staged back to the main data vDev will have the same consequences as a crash during an async write i.e. data may be lost.
Adding an SLOG adds complexity to the configuration, and greater complexity brings its own risks, and increased complexity in managing the server when something goes wrong.
The performance benefit of an SLOG is only going to be on those occasions when you do synchronous writes - you will need to decide how frequently these will occur and what the time savings will be in order to decide whether an investment in an SLOG is justified.

sheliak · August 23, 2024, 10:40pm

I see that the topic has really heated up

However, returning to previous questions and observations

I pulled out the 10GBe internet card and the missing RAM is back - all I have to do is buy another 32GB (but that will be in the coming months)
I have a raspberry pi 4 with Jellyfin for data transfer up to 1GB and here I would like to transfer 3x 4K, that’s why I bought a 10GB card, because I have already bought disks for SLOG and L2ARC so I will mount them, at most they will not be used to the extent they could be used…
what exactly does this script do: fetch ftp://ftp.sol.net/incoming/solnet-array-test-v3.sh ?
You wrote that:
“I have only included two main runs, a parallel read run and a parallel read run with multiple disk accesses. The script will perform a basic performance analysis and indicate possible problems.”
How does the script check if the disks/vdev/pool are OK?
Can new disks be checked under Windows in e.g. Hard Disk Sentinel, e.g. by testing the disk surface? Instead of this script? Isn’t that the point of this test?
Can I use this script to test disks with data without damaging them?
Does the script run continuously until it stops? How does it signal errors? Does it throw errors or threats during the test if they occur? Or only after it is stopped?
Lots of questions The point is that I won’t be able to test the entire empty pool right away… I’ll buy another 8TB disk and build a RAID Z2 as you write and I can test this first vDev, but to add more disks to the pool I have to empty them first, and after emptying the next ones the first vDevs will be full…

etorix · August 24, 2024, 7:20am

It reads drives with various paterns and reports statistics to help identify bottlenecks.

Anything which stresses the drives goes…
Solnet-array is non-destructive and can run on a live pool. It runs contuously until stopped. Errors would be reported by ZFS and/or appear in SMART reports.

Davvo · August 24, 2024, 7:37am

It’s a burn-in test.

sheliak · August 28, 2024, 6:30pm

Gentlemen. I still need your help
Among the many 8TB WD Purple drives, it turned out that I also have two of these: “MDL: HUH721010ALE601”, but TrueNAS does not detect them
The drives are new, unused. At first I thought they were damaged, but I connected them to another computer and Windows detected them, so they seem OK.
So I connected them to the computer with TrueNAS again and nothing… It still does not see them… I thought to myself, maybe the controller does not detect them, so I will connect them directly to the motherboard of the computer with TrueNAS, but still nothing… I keep trying - maybe not enough RAM - the system shows 16GB, so I remove the 10GB network card - another 16GB RAM came back - now it is 32GB but the drives are still not visible…
Do you have any ideas?
Please help…

etorix · August 28, 2024, 6:47pm

How are the drives powered?
HUH = Ultrastar = 3.3V Power Disable
These drives do NOT spin when connected to a 15-pin SATA connector which provides power to pin 3. They work fine when powered by a backplane or by a 4-pin Molex connector through an adapter. Which can be quite maddening…

sheliak · August 28, 2024, 7:14pm

It worked. Thank you

sheliak · August 28, 2024, 7:52pm

Are you sure you mean this command? Maybe another one can be used?
fetch

Davvo · August 28, 2024, 8:47pm

Your system does not have fetch (I guess SCALE does not come with it?), try wget.

sheliak · August 29, 2024, 2:59pm

Gentlemen, I’m trying to run this test but it says the script can’t access the disk? Has anyone had a similar problem?

"
Selected disks: sda
WDC WD8004FRYZ-01VAEB0 7.3T disk sda
Is this correct? (y/N): y

You can select one-pass for the traditional once-thru mode, or
burn-in mode to keep looping forever.

One-pass or Burn-in mode? (o/B): o
Performing initial serial array read (baseline speeds)
Thu Aug 29 16:49:08 CEST 2024
Thu Aug 29 16:51:23 CEST 2024
Completed: initial serial array read (baseline speeds)

This test checks to see how fast one device at a time is. If all
your disks are the same type and attached in the same manner. they
should be of similar speeds. Each individual disk will now be
compared to the average speed. Results that are unusually slow or
unusually fast may be tagged as such. It is up to you to decide if
there is something wrong.
!!ERROR!! dd: failed to open ‘/dev/sda’: Permission denied
awk: cmd. line:1: fatal: division by zero attempted
admin@truenas[~]$

"

Protopia · August 29, 2024, 3:46pm

You probably need to run this as root with sudo.

sheliak · August 31, 2024, 12:12pm

Hello Gentlemen, I am wondering about the bottlenecks of my set and I have a question, will SLOG and L2ARC not be in my case? I know that if I add vDev to the pool, I can no longer disconnect it without destroying the entire pool, and what about SLOG and L2ARC memory? If something bad happens to them, can I remove or replace them without damaging the pool?

etorix · August 31, 2024, 12:23pm

SLOG and L2ARC can be added or removed at any time, irrespective of the data vdev layout.
But you shouldn’t worry about “performance”^[1] with a pool of known bad disks: You should worry about replacing the bad disks first.

Basic reminders: SLOG is only for sync writes. L2ARC requires at least 64 GB RAM first.
If your workload is SMB sharing and ARC hit ratio is already over 99%, neither SLOG nor L2ARC will make any difference. ↩︎

joeschmuck · August 31, 2024, 12:51pm

So this thread started out as a data recovery effort, it has grown into a system rebuild effort with a concern for speed, which is perfectly fine, however I think I missed your Use Case. There was a lot to read here so I apologize if I missed it.

I only say this because so many people do not understand what each part of a TrueNAS/ZFS system is and how it affects the other parts. Some parts will make the system provide data slower such as an L2ARC if you are not repeatedly accessing the same data. There are ways to build your pool(s) to make them significantly faster as well, however not always cost effective.

So, knowing exactly what your use case is (what the NAS will be doing) is a pretty important piece of the puzzle. The fewer parts = Less complexity. Based on what I have read, I first thought it was an archive, now I’m not so sure. Also, when I read your screen name, I can’t help but read it backwards, which of course has no bearing on this conversation.

Regardless of what you want to build, the folks here are willing to help you, I just think it is important to know what you are building.

sheliak · November 24, 2024, 7:07pm

Hello Gentlemen. So I built and my set has been working for a few months (10x8TB-raidz2 and 10x6TB-raidz2) - however, I’m coming back to you today with a problem. One of the new disks would no longer be visible in the system (WD PURPLE 8TB: MDL HUH721010ALE601). At the very beginning, when this disk “disappeared” from the system, during startup you could hear “knocking heads” as if it was not receiving voltage. I checked the cables and everything was ok. I connected this disk to a different power supply and sata connector and it was not visible either. I thought that the disk was a write-off, but I connected it to Windows 11 and the system sees it in the disk manager. TrueNas Scale does not see this disk, it is not in lsblk. Is there a way to check it from TrueNAS?

etorix · November 24, 2024, 7:13pm

Looks like a HGST drive with 3.3V PWRDIS. Try powering it from 4-pin Molex using an adapter.

sheliak · November 24, 2024, 7:28pm

these disks because I have two of them were always connected via molex adapter. I even changed it to a new one to check if it was not the cable’s fault

sheliak · November 25, 2024, 8:33am

so I realized now that when I connected this disk to Windows, the system saw it, but it saw about 9310mb or 9130mb, and it’s an 8TB disk, not a 10TB one. Maybe this disk is damaged?

Protopia · November 25, 2024, 12:52pm

Because compression is on by default an 8TB disk can store more than 8TB of data.

Also an 8TiB disk is almost 9TB in size.

Finally a damaged disk in a ZFS pool is unlikely to cause space to be mis-reported.