Metadata VDEV impact is "not noticeable" / it "does not perform as expected"

etorix · August 16, 2024, 9:55pm

Indeed, it is not meant as a revolutionary change. The main quality of this board is that a German reseller is getting rid of the stock for cheap^[1] right now, so it is a good opportunity to get a genuine server board, and its IPMI. It does have a TPM header. The single M.2 slot is strictly intended for boot, which is why one needs a riser in the x16 slot for more, but such risers are cheap, and all “data” M.2 drives would then be on CPU lanes rather than chipset lanes.

Irrespective of the board, I’d also suggest consolidating to a single 3- or 4-wide raidz1 pool for all bulk data, with various datasets for different uses. (And thus replacing the case… Two hard drives is not enough for a NAS.)
No special vdev. But you may consider a 512 GB persistent L2ARC for metadata if the workload is mostly reads.
If you have to boot from USB, put a cheap SSD on USB adapter.

they have advertised lower prices than the current 79.99E in the recent past… and accepted even lower offers (50 or below) ↩︎

Constantin · August 16, 2024, 11:56pm

FWIW, I’ve had very good experiences with both a sVDEV as well as a metadata-only, persistent L2ARC. The former requires planning, etc. the latter results in somewhat slower metadata performance than a sVDEV but is redundant. A metadata only, persistent L2ARC is only advisable when RAM exceeds 64GB, however and there are some limits re: the size of SSD to dedicate to L2ARC vs. available RAM (L2ARC pointers cut into ARC RAM).

Storing metadata on SSDs has a significant performance benefit when it comes to tasks that include a lot of directory traversals like rsync backups. I doubt you’d see a significant benefit unless the pool is filled at least 25%, as a very small amount of metadata will simply get read into the ARC followed by the ARC doing all the work, not the SSD.

Even fuller pools may feature very little metadata if you consolidate large collections of files with tools like Apple sparsebundles. My pools metadata clocks in somewhere around 0.03% of pool capacity, 1/10th the rule of thumb of 0.3%. But that’s because most of the files are relatively ‘large’ images and videos. Your use case and file types may be quite different.

Lastly, the other reason to go sVDEV (which sets aside 75% of drive capacity for small files, 25% for metadata by default), is that the HDDs are rid of all files that are below the small file cutoff that you can set on a per-dataset basis. That in turn allows you to fine-tune what files / datasets get the full SSD benefit (i.e. databases, VMs, etc.) whereas for others only some small files will (archives). See the sVDEV resource for more info.

NickF1227 · August 17, 2024, 6:41am

This post is more “For Science” than it is a recommendation of any kind.

Benchmarking for special vdev isn’t exactly straight forward. It’s not like adding additional vdevs where you can run standard benchmarking suites and understand the performance implications. We’re specifically modifying how metadata is stored in the pool, and you should really be testing LOCALLY on the NAS first before moving to SMB or another sharing protocol if you’re trying to really see the impact without introducing variables.

A really quick and dirty approach is to make a bunch of empty files and then time how long it takes to make them, followed by going back through and listing the contents of each of those directories.

For giggles and laughs while reading this thread I made a script that does that. You’d want to go into the shell on your NAS and make a folder inside of a dataset on your pool. From there use nano to make a file called test.sh, paste the contents of the script below (control+o and then control+x to save and exit), and then type chmod +x test.sh. You can then run the script by doing ./test.sh

#!/bin/bash

# Number of directories and files
NUM_DIRS=5
NUM_FILES=50000 
MILESTONE=10000  # Milestone to report progress

# Base directory to create all folders in
BASE_DIR="./test_folders"

# Create a base directory
mkdir -p "$BASE_DIR"

# Function to get current time in milliseconds
current_time_ms() {
    date +%s%3N
}

# Measure time to create folders and files
START_CREATE_TIME_MS=$(current_time_ms)

# Initialize milestone times
PREVIOUS_MILESTONE_TIME_MS=$START_CREATE_TIME_MS

for ((i=1; i<=NUM_DIRS; i++)); do
    # Format the directory name with leading zeros
    DIR_NAME=$(printf "%05d" "$i")
    mkdir -p "$BASE_DIR/$DIR_NAME"
    
    for ((j=1; j<=NUM_FILES; j++)); do
        # Format the file name with leading zeros
        FILE_NAME=$(printf "%05d.txt" "$j")
        touch "$BASE_DIR/$DIR_NAME/$FILE_NAME"
        
        # Print elapsed time every 10000 files
        if (( j % MILESTONE == 0 )); then
            CURRENT_TIME_MS=$(current_time_ms)
            CURRENT_MILESTONE_TIME_MS=$((CURRENT_TIME_MS - PREVIOUS_MILESTONE_TIME_MS))
            
            echo "Time for creating $j files in directory $DIR_NAME: $((CURRENT_MILESTONE_TIME_MS / 1000)).$((CURRENT_MILESTONE_TIME_MS % 1000)) seconds."
            
            # Update the previous milestone time
            PREVIOUS_MILESTONE_TIME_MS=$CURRENT_TIME_MS
        fi
    done
done

# Silently track the total elapsed time
END_CREATE_TIME_MS=$(current_time_ms)
CREATE_ELAPSED_TIME_MS=$((END_CREATE_TIME_MS - START_CREATE_TIME_MS))

# Measure time to recursively perform 'ls' in the base directory
echo "Starting recursive 'ls' command timings..."
START_LS_TIME_MS=$(current_time_ms)

# Recursively list all files and subdirectories
(cd "$BASE_DIR" && ls -R > /dev/null)

END_LS_TIME_MS=$(current_time_ms)
LS_ELAPSED_TIME_MS=$((END_LS_TIME_MS - START_LS_TIME_MS))
echo "Recursive 'ls' completed in $((LS_ELAPSED_TIME_MS / 1000)).$((LS_ELAPSED_TIME_MS % 1000)) seconds."

# Measure time to remove and clean up all test files
echo "Starting cleanup..."
START_CLEANUP_TIME_MS=$(current_time_ms)

# Remove all test files and directories
rm -rf "$BASE_DIR"

END_CLEANUP_TIME_MS=$(current_time_ms)
CLEANUP_ELAPSED_TIME_MS=$((END_CLEANUP_TIME_MS - START_CLEANUP_TIME_MS))
echo "Cleanup completed in $((CLEANUP_ELAPSED_TIME_MS / 1000)).$((CLEANUP_ELAPSED_TIME_MS % 1000)) seconds."

# Calculate and print total elapsed time
TOTAL_ELAPSED_TIME_MS=$((CREATE_ELAPSED_TIME_MS + LS_ELAPSED_TIME_MS + CLEANUP_ELAPSED_TIME_MS))
TOTAL_ELAPSED_MINUTES=$((TOTAL_ELAPSED_TIME_MS / 60000))
TOTAL_ELAPSED_SECONDS=$(((TOTAL_ELAPSED_TIME_MS % 60000) / 1000))
TOTAL_ELAPSED_MILLISECONDS=$((TOTAL_ELAPSED_TIME_MS % 1000))

echo "Total elapsed time: $TOTAL_ELAPSED_MINUTES minutes, $TOTAL_ELAPSED_SECONDS seconds, and $TOTAL_ELAPSED_MILLISECONDS milliseconds."

This isn’t testing a real-world scenario, but should be a potential way to test for improvement directly associated with a metadata vdev I think.

System on the left is AMD EPYC 7F52 with 5x RAIDZ2 7 wide VDEVs of 8TiB drives and System on the right is AMD Ryzen 3700X with 2x 2-way mirrors of Optane 905P 960GB. Neither has special vdevs. Both are SCALE 24.04.2

Posting just for reference point. CPU (single threaded performance) and other system performance such as RAM speed may be a factor here, so direct comparison between different systems might be of limited value. Really the idea is to compare the pool with and without special vdev for metadata, ON THE SAME SYSTEM, to see if this test can tease out some intelligence.

I’d be interested in seeing the results for a clean fresh pool both without a special vdev, as well as clean fresh pool with a metadata special VDEV @louis

If there’s interest in this topic I can probably come up with better stuff to do, like @Constantin mentioned rsync is a good one to test.

Protopia · August 17, 2024, 9:16am

Benchmarking - which are scientific and repeatable measurements of performance - is actually much much more difficult even than this.

For benchmarks to be repeatable they have to have the exact same starting conditions, and with a dynamic ARC that can only be achieved by:

flushing the ARC completely immediately before the test starts; or
rebooting and running the script immediately after boot completes and load has stabilised; or
prerunning the test to populate the ARC with consistent stuff before running again to take the measurements. (Note: If you want to measure ARC effectiveness this is good, but if as in this case you want to measure device performance, this is bad.)

louis · August 17, 2024, 11:54am

I am thinking about how I could improve my system with reasonable effort and cost. And I do not yet know. The main problem always was and still is, is the lack of PCIE-lanes.

The only way I could free up PCIE-lanes is to give up the graphic card. However my idea was to use the machine as virtual linux / freebsd machine with GUI as well (running in a VM) . That that is not trivial is clear. E.g. NVDIA is doing every thing they can to restrict that kind of usage to expensive professional cards.

And of course to spend 16 PCIE-lanes to the graphic card is all most ever wasting costy PCIE-lanes. But it is reallity.

If I sometime in the future will build the next-generation NAS, I will probably choose a MB having significant more PCIE-lanes and 3 X16-slots (one for an extra GPU, one for the 10G NIC and one for extra NVME. And again a CPU with build in GPU. That however will be costly and imply a big case.

For now that is however two bridges to far …

Davvo · August 17, 2024, 12:39pm

Something like this https://www.asrockrack.com/general/productdetail.asp?Model=ROMED6U-2L2T#Specifications.

louis · August 17, 2024, 1:59pm

Too Expensive …

No … If I had to start in the near future probably I would look into the cheapest top end AMD5 board (more PCIE) with the cheapest processor. But even that will be too expensive for a home NAS

Davvo · August 17, 2024, 2:08pm

The PCIe lane number you want is found only in Xeon, EPYC, or Threadripper boards. Plus, using two or three GPUs is not going to be cheap.

louis · August 17, 2024, 3:15pm

In general I am happy with may actual NAS, which is IMHO all ready a top end Home-NAS.

If I can improve it by small changes, I will perhaps do that. However within that limitation, I do understand the comments given.

And if I would have needed a enterprise level NAS ,of course I would have used another MB another case and would have used raid arrays.
And perhaps I would have installed a multi thousand euro video bord

But for now I will live with my actual NAS, perhaps with some upgrades or with a changed setup. And my intention is surely not to install n-video boards. I would be happy if I could share one cheap one across VM’s that is all.

louis · August 17, 2024, 3:20pm

Note that actual AM5 motherboard and certainly those combined with the top end chipsets, do have more PCIE-lanes than the B550 board I am using now. In fact for a new build an AM4 570 motherboard with an 5700G CPU would IMHO not be a bad choose.

Farout · August 17, 2024, 3:24pm

If you want enterprise features, for consumer prices, you need to go a few cpu generations back. But consumer boards and cpus dont have the features you want and probably wont in the near future either.

etorix · August 17, 2024, 3:32pm

You cannot share a GPU across VMs: What’s passed through to one VM is no longer available to others.

If what you want is one graphical VM, an MC12-LE0 with x8x4x4 riser can do it: x8 to half-height GPU (does not mind if the card is x16, it will work well enough with 8 lanes), 2 M.2 for apps/VMs, x4 electrical slot for 10 GbE NIC, onboard M.2 to boot, 6 SATA for bulk storage.
Or drop the GPU alltogether and pass the iGPU.
Same principle with an AM5 build.

Davvo · August 17, 2024, 3:48pm

Sure, but you won’t find any with the three 16x slots you want.

Fleshmauler · August 17, 2024, 4:19pm

Well… it IS possible, but will require slicing it up into a vGPUs & that would require enabling dev mode & then either some work following a guide or a lot of work following a guide & actually getting it stable. Either way it is in the ‘here there be dragons’ territory.

Not something I’d be excited to implement on a NAS, but clutch on a hypervisor that isn’t my main storage vault.

louis · August 17, 2024, 5:58pm

Also IMHO that is not realistic I would only be realistic if companies like NVIDIA, AMD and Intel make that option default available. What they don’t for harware and … commercial reasons.

louis · August 18, 2024, 12:21pm

Surely an interesting riser card !! From the functionality point of view exactly where I am looking for.

However
It might work, but from mechanical and electrical point of view … It probably would not make things more stable. Not in any way. Placing a small graphic card on top is probably possible, but if it is stable …
Not sure as well if the AM4 B550 bios is supporting this

From the functional point of view, it is were I am looking for! Never the less I am not convinced

Stux · August 18, 2024, 12:29pm

The 8x4x4 bifurcation cards are supposed to support using a low profile PCIe card in a full height slot, with the regular screw down

louis · August 18, 2024, 2:34pm

My 1030 is half height so in principle it should be possible. But it feels very very hazy / tricky.

It is of course also a no name product, I could destroy the whole system using such a card. At least it feels that way

stthomp · August 20, 2024, 5:10pm

I read through this entire post.

To all of you who trying to help Louis, well done. I think style points should be awarded for patience and kindness. Overall a very difficult task taken on at no cost or revenue.

Bravo

Louis, listen closely to your teachers/gurus here that are assisting you with your issues.

My opinion is you may have started oubt in the wrong direction and are trying to recover by continuing in the same direction. I say this because it my normal way of doing things. One day a friend told me to turn around and go back. Even though it is theoreticaly possible to walk around the world, it is easier to turn around, walk the 50 meters you need to get where you need to be.

I am hugely impressed with everyone involved with this discussion. On most forums there would have been rudeness, meanness and much negativity.

Protopia · August 20, 2024, 5:43pm

A lack of PCIe lanes is only a problem if you are using your PCIe lanes heavily and having to share them across multiple devices.

With 92GB of RAM mostly for ARC you should be able to cache almost all metadata plus pre-fetch read-ahead, and so easily hit 99.9% cache hit rate. With the remaining I/O to 2 NVMe + 4x SATA you probably won’t get anywhere close to maxing out the PCIe lanes you have.

Topic		Replies	Views
Truenas Scale 24.10.2 10g transfers not as fast as I expected TrueNAS General SCALE	14	539	February 14, 2025
Budget Build - Couple of Questions - Will Share Lessons Learned! TrueNAS General SCALE , Hardware	27	657	April 10, 2025
Questions RE: New Overkill NAS TrueNAS General SCALE , Hardware	16	537	September 15, 2024
First NAS m-ITX build - for the family TrueNAS General Hardware	25	3185	June 19, 2025
Pool unhealthy, all disks and cables working fine TrueNAS General CORE , Hardware	34	886	June 16, 2024

Metadata VDEV impact is "not noticeable" / it "does not perform as expected"

Related topics