Metadata VDEV impact is "not noticeable" / it "does not perform as expected"

EasyRhino · August 20, 2024, 9:37pm

So there are a few reasons why having the special vdev isn’t going to show a difference.

First, the job of the special vdev is to store metadata (file and directory info) and maybe small files, depending on config. How did you test it? By copying a single large zip file. How much metadata needs to be updated by that? One files worth. Basically nothing.

Conversely, I’m hosting storj nodes on my system. We are talking about like 30 million tiny files on a single hard drive and lots of operations that need to traverse every file for statistics. It’s miserably slow. Having a special vdev, or just an L2ARC cache, helps a lot.

Depending on your workload, if you’re storing large files, a special vdev won’t really help you. Lot’s of tiny files? sure it will help. But, as noted, it introduces another point of failure, now if EITHER drive fails then it will take out your node.

Also benchmarking with ZFS, and doubly a network share, just isn’t going to work right with crystal diskmark because the data will get cached in RAM and report crazy high numbers.

TLDR: maybe you don’t need a special vdev at all.

Protopia · August 21, 2024, 8:48am

@EasyRhino What a great explanation and example of the special circumstances that can justify a special vDev.

louis · August 21, 2024, 3:15pm

If you read the begin of this treat you see that my conclusion is that I did test using a folder with small files. And also my conclusion that Crystal DiskMark does really give a wrong impression (In fact I will never use that program again, since I saw its outcome vs reality).

But next to the L2ARC cache, I think one of the reasons for my disappoint is … Windows. Given the windows small file performance (even when using NVME-SSD’s) seems to be terrible, the tests I did are seriously affected by windows limitation.

What ever given the limited benefit, I did decide to replace the NVME-Metadata Dev with a SATA-SSD. Because:

by only replacing the NVME SSD by a SATA-SSD I can leave the dataset as it is
and I still have the limited advantage of the VDEV
and I can populate the freed NVME-slot with a bigger NVME-SSD for aditional fast storage

Davvo · August 21, 2024, 3:18pm

Welcome to the club

Protopia · August 21, 2024, 3:38pm

Windows may create performance bottlenecks but that does NOT mean that a metadata vDev is giving you any real benefits whilst adding some very real risks.

I fear that you are not really taking in what people are saying. To reiterate:

With 92GB of memory, your metadata is likely to be permanently held in ARC anyway in which case a metadata vDev will give very little, if any, benefit.
Every non-redundant vDev you add to a pool increases the risks of failure of the entire pool.

Switching from an NVMe metadata vDev to an SSD one, will not change its ineffectiveness.

But if you buy 2x SATA disks and use them as a mirror pair for the metadata vDev, then at least you are providing redundancy and not increasing the risk of data loss even if it still isn’t going to give you any benefit.

But this is your NAS and your decision - but if you find that you have to come back here for help with data recovery because you didn’t listen, then expect to be told-you-so.

louis · August 21, 2024, 5:21pm

Protopia,

It is not that I do not understand your remarks, … but …

adding the proposed redundancy would cost almost 1000 €
(1 extra 16 TB-drive (€ 375) + 1 extra 4TB NVME (€ 350) etc, not to speak about new MB, CPU, RAM etc)
and the actual MB and case do not support those extra devices
and the RAM is not ECC
the CPU is single
And the change that high quality Seagate drives will fail etc

So, your suggestions are NOT “out of the blue” but implement them would lead to a near completely new build and would ask for an additional investment as high as the price of the actual system.

Davvo · August 21, 2024, 6:59pm

Which should give insight for the future: before committing, ask if anything is wrong.

This is not directed necessarily to you, but to future readers.

Protopia · August 21, 2024, 8:17pm

So, which bit about NOT needing a metadata vDev did you not understand??? You do NOT need to buy a second 4TB NVMe drive, nor ECC RAM, nor a new MB nor CPU, RAM etc. and no one has suggested you do.

Clearly you have NOT been listening to the advice of people with more experience than you.

Plus in an earlier post you were suggesting spending money on a SATA SSD and a new larger NVME - money which would be wasted compared to buying another 16TB drive.

If you defer the purchase of the 3rd 16TB drive until later you may well have a LOT of difficulty turning a non-redundant stripe into a RAIDZ1 without moving the data elsewhere, so avoiding RAIDZ1 now on cost grounds will cost you a lot more time, effort and possibly money in the long run.

(I suppose you could create a RAIDZ1 now using the 2x16TB drives and a sparse file as the third drive - and then you immediately delete the sparse file, leading to a degraded pool which is essentially the same 2x 16TB stripe as you are proposing only with an ability to replace the missing sparse file with a 3rd physical drive later and resilvering to get the redundancy back. But this is definitely not any more recommended than the stripe you are considering and it does add some additional technicality to the solution.)

Please listen to the advice that I and others are providing you - using our valuable time at no cost to you.

louis · August 21, 2024, 9:05pm

An extra 16 TB drive would cost lets say around € 350 and there is no place for that drive in case.

Combining the two 16TB drives to one 16 TB raid one is an option, but then I would lose 16TB storage capacity. An expensive option but never the less it is something to consider.

I am going to place a 2TB NVME-drive in have available in the second NVME slot as a separate fast data-set. In that way I have 6TB NVME storage available
(Yep I know not redundant)

Than I have:

a 1TB-SATA intended as METADATA VDEV but I could decide otherwise
And a 500 GB-SATA as boot / system disk
and also but it seems to be risky two SAMSUNG 250G UDB sticks which I did intend to place as raid1 and to use system disk

I really do not know, redundancy is costing lots (!!) of money as specially in case raid1.

And I really think that a copy on another system (e.g. my old NAS) reduces the risks more than a raid1

Stux · August 21, 2024, 9:14pm

ARC is not permanent. Probably a figure of speech, but worth emphasizing.

Yes, if you perform a full directory traversal (say find /mnt/pool that will bring in the metadata), and that can then spill to l2arc, which can be persistent

But the metadata will be permanently stored on an SSD if it is on a special VDev.

Protopia · August 22, 2024, 7:24am

I know that - I meant that once it is loaded it will likely stay in ARC. If you want to preload it, then create a startup script to traverse the directories.

Protopia · August 22, 2024, 7:31am

You were previously talking about buying a SATA SSD? Where are you planning to use that? Indeed you are still talking about using a 500GB SATA as a boot drive. Ditch that idea and use this SATA port by buying another 16TB drive.

You really aren’t listening. This is going to do little to improve the performance of your system once it is up and running. Use this SATA port with another 16TB drive.

Use one NVME as boot drive (with a new small NVME card). Use the second NVME as an app pool (and back it up to HDD using replicate).

Yes redundancy costs money - but it’s worth it and a cost of having large storage.

A copy on another system solves a completely different use case to redundant disks. Redundancy is about availability of your primary copy. Backup is about having a separate copy in case of disaster.

Yes - a backup copy is a great idea, and it avoids the risk of losing most of your data, but when a drive fails you will spend days getting everything back up and running again.

Protopia · August 23, 2024, 12:46pm

@louis has said elsewhere that he doesn’t have a 16TB drive spare to make the RAIDZ1. My point is that you don’t need one. You can make a RAIDZ1 using 2x HDD + a 16TB file. But the 16TB file can be sparse i.e. blank blocks which are not allocated so the 16TB file actually uses 0TB of disk - and its a file in an existing pool. When you create the RAIDZ1 using the 2xHDD + sparse file, a few blocks of metadata are written into the sparse file (so it is now uses a few hundred MB of disk space. Then you delete the sparse file and the pool gets degraded and is effectively the same as a 2x 16TB stripe EXCEPT if you add a 3rd 16TB drive it will then resilver back to a redundant set which you cannot do if you start with an ordinary stripe.

I am now going to stop providing any more answers here (and I would ask @Davvo to close this thread in a couple of days if it gets no more replies).

I hope @louis takes note of the recommendations and does not have to come back cap in hand at a later date asking us to help him out of a hole created because he did not take the advice given here.

Protopia · August 23, 2024, 12:51pm

Continuing the discussion from Explanation why advocated redundancy is not realistic:

Actually you do not have to do this. It is relatively simple to tweak the TrueNAS install so that it uses (say) the first 16GB or 32GB of the boot drive for the boot pool (rather than the whole drive), leaving the remaining 470GB available to create e.g. an apps pool. (This is what I do on my own NAS.)

And you can do this with either a SATA SSD or an NVMe drive.

Protopia · August 23, 2024, 12:56pm

Continuing the discussion from Explanation why advocated redundancy is not realistic:

Explanation why advocated redundancy is not realistic

Implementing the redundancy you advocate is not a bad idea, but assuming I would like to have an option for a four drive Z1 (starting with three drive) the implication would be:

new case

extra 16TB drive

get rid of the SATA-boot drive (The MB only have 4 sata ports)

a lot of work

I think that price is too high for the extra redundancy, surely given the fact that

files are available on an other machine and or the NAS second drive

I have off-line backups

doubleing the NVME’s SSD would cost an additional € 400

the redundancy does not protect against mistakes from my side

viruses etc

damages to the NAS processor / ram / over voltages / etc

So I really have been thinking about your comments / ideas, and I am aware about professional redundancies, but this is a home machine and not a professional NAS.

Of course more redundancy would be nice, however as written before those requirements would at least double the cost of the NAS. So the only part I did consider a Z1 for the HD’s but as explained even apart form the cost, that is not realistic

This is, to put it politely, a complete travesty of what I have been suggesting.

You have suggested spending extra money on SATA SSDs and moaned about having to use a valuable SATA port for a boot drive.

I have suggested a completely different way of using your existing hardware which will be more effective and perform extremely well. And I have suggested that you spend the same extra money you were planning to spend on SATA SSDs on a 3rd 16TB drive.

But you have completely mis-interpreted what I have said.

On this basis, this is my very last post in this thread. You are now on your own. Good luck.

Stux · August 23, 2024, 3:10pm

Electric Eel allows you to add a 4th (or 5th etc) drive to a 3 drive RaidZ1.

@Protopia has provided a way to create 3 drive RaidZ1 with only two drives.

You can do it with 1 drive if you partition it.

EDIT: apparently it also allows 2-drive raidz1.

etorix · August 23, 2024, 4:42pm

I did not know that Game of Thrones: Revenge of the Dragons was being produced in Australia…

stthomp · October 1, 2024, 6:16pm

Don’t you mean welcome to the clubbing? (Dont mean disco either)

Protopia · October 2, 2024, 11:27pm

My way of creating a 3 drive RAIDZ1 with only two drives:

Can be done on Cobia or DragonFish - it doesn’t need EE;
Results in a degraded non-redundant pool and is therefore only intended to be a temporary measure whilst you move data off a 3rd drive so that you can use it to resilver and get redundancy back short term.

However its a way to start with what is effectively a striped non-redundant vDev and later add a redundant drive which you cannot do with a traditional stripe. Non-redundant stripes are not recommended for data which has any importance to you - one of your drives will eventually fail, and you will lose every piece of data on all the drives in the pool!! So if you are going to do this with newish drives that you know are currently in great condition, then do it for a limited time.

The same technique can be used to e.g. create a degraded RAIDZ2 pool which is effectively a RAIDZ1 pool, and later add another drive to upgrade it to RAIDZ2 - you can’t do this any other way either. Running a RAIDZ2 degraded as effectively a RAIDZ1 is not recommended either, but it is less risky than a non-redundant stripe and should work.

A 2-drive RAIDZ1 is equivalent to a mirror, but one where you can add drives incrementally to create a 3+ wide RAIDZ1 which you cannot do with a normal mirror.

louis · October 3, 2024, 11:36am

@Protopia Based on your concerns and also concerned due to the fact that Amazon is offering an incredible lot of refurbishment drives … I did decide to change the backend storage pool to a Z1. That was NOT easy and NOT cheap !!!

I did buy a hard disk container unit (Phanteks HDD-Montage Bracket 2x 3,5") which I could mount in my case and did create room for two extra disks that way.

I did purchase two extra 3,5 16TB drives to create the Z1.
Did buy an adapter to connect the boot SSD via USB
(to free up the a SATA-port in favor of the Z1)

Keeping the original drive as temporarily storage for the data.
Export the original pool
Created the Z1 with under the same name as the original with the other drives.
Tried to import the original drive under a new name … not possible via the GUI (stupid!)
via the command line some boots etc in the end it worked
copied the original files to the Z1