Encrypted dataset performance: A statistical experiment

mr-valente · February 15, 2025, 9:17pm

Hi all,

I was curious about how much of performance hit I was taking by using encrypted datasets on a spinning HDD. So I made two new datasets, one encrypted, and one not, and ran this script to compare read/write speeds. I am certain that there are many limitations and caveats to this analysis, and I did this mostly out of curiosity / for fun, but the results are still quite interesting!

You can find the script here:

gist.github.com

https://gist.github.com/mr-valente/ceef0aacbf172f48844cc38ba810a2e4

benchmarkdrive.sh

#!/bin/bash

# Check for the correct number of arguments
if [ "$#" -ne 3 ]; then
  echo "Usage: $0 <pool_directory> <count> <iterations>"
  exit 1
fi

POOL_DIR="$1"
COUNT="$2"

This file has been truncated. show original

Overview

The BenchmarkDrive script is a bash utility designed for benchmarking the performance of HDDs in a TrueNAS SCALE storage pool. It measures sequential write and read speeds by writing random data from /dev/urandom to test files on the specified dataset and then reading them back. The script runs for a user-defined number of iterations and outputs two files:

A CSV file containing the read and write speed for each iteration.
A TXT log file that stores details about the run, such as the dataset directory, file size parameter, iteration count, and timestamp.

Usage

Save the script as benchmarkdrive.sh (or another preferred name). Ensure the script is stored on a pool other than the boot pool (since /home on the boot pool has the noexec property set).

Run the script with the following syntax:

sudo ./benchmarkdrive.sh <dataset_directory> <count> <iterations>

Where:

<dataset_directory>: The mount point of your dataset (e.g., /mnt/pool/dataset).
<count>: The size of the test file in megabytes (e.g., 1000 for 1GB).
<iterations>: The number of times to run the test (e.g., 30).

Example:

sudo ./benchmarkdrive.sh /mnt/pool/dataset 2000 100

This command benchmarks /mnt/pool/dataset by writing and reading a 2GB file for 100 iterations.

Permission Issues:
If you encounter “permission denied” errors, ensure the script is stored on a filesystem without the noexec restriction (e.g., not in /home) and that you are running it with sudo.

Method

For each iteration, the following steps are executed:

Write Test:
- A test file named testfile_X (where X is the iteration number) is created in the specified pool directory.
- The script uses dd with /dev/urandom to write random data to the file. The options used (bs=1M, count=<count>, and oflag=direct) ensure direct I/O.
- The output from dd is captured, and the write time (in seconds) is extracted using a sed regular expression.
Read Test:
- The script reads the test file back with dd, sending the output to /dev/null and using iflag=direct.
- The read time is similarly extracted from the dd output.
Cleanup:
- The test file is removed after both tests.
- A new row with the iteration number, write time, and read time is appended to the CSV file.

Results

Test Parameters

File Size: 2GB
Iterations: 100
Pool: Mirror configuration on 2x Toshiba N300 6TB drives
Datasets: Both the encrypted and unencrypted datasets were created on the same pool with LZ4 compression and no dedup.
Encryption method: AES-256-GCM

Sample Statistics

Operation	Dataset	Average Time (s)	Standard Deviation (s)
Write	Encrypted	5.10	0.77
Write	Unencrypted	4.69	0.77
Read	Encrypted	0.513	0.33
Read	Unencrypted	0.263	0.062

Analysis

A two-sample T-interval was calculated for the differences between the encrypted and unencrypted datasets:

Write Speed Difference (Encrypted - Unencrypted):
- Point Estimate: 0.411 seconds
- 95% Confidence Interval: (0.198, 0.625) seconds
- Interpretation: Write operations on the encrypted dataset are between 4.2% and 13.3% slower compared to the unencrypted dataset.
Read Speed Difference (Encrypted - Unencrypted):
- Point Estimate: 0.250 seconds
- 95% Confidence Interval: (0.184, 0.317) seconds
- Interpretation: Read operations on the encrypted dataset are between 69.7% and 120.4% slower compared to the unencrypted dataset.

Observations

Write Performance:
The encrypted dataset exhibits a moderate slowdown in write speeds, with an increase of approximately 0.41 seconds on average. This corresponds to a 4.2%–13.3% reduction in performance—a noticeable but not extreme degradation.
Read Performance:
The performance penalty for read operations is far more dramatic. The encrypted pool’s average read time is nearly double that of the unencrypted pool, with an overhead that ranges from 69.7% to 120.4%. Additionally, the standard deviation for the encrypted dataset’s read times (0.33 seconds) is significantly higher than that for the unencrypted dataset (0.062 seconds), suggesting not only a slower performance but also greater variability and potential inconsistency in read speeds.

Conclusion

The benchmarking potentially demonstrates that encryption introduces statistically significant performance penalties:

Write operations on encrypted volumes are modestly slower.
Read operations, however, suffer a substantial performance hit—both in average speed and consistency.

However, it is unclear what is causing this performance difference. There are many factors at play here, and a deeper analysis would be required to understand exactly what’s happening.

winnielinnie · February 15, 2025, 9:25pm

What hardware did you test this on?

mr-valente · February 15, 2025, 9:31pm

Ah, yes, forgot to mention that. My homelab runs on an old gaming PC with an Intel i7-8700K CPU and 32GB DDR4-3000 memory. The storage drives are connected with standard 6.0 Gb/s SATA.

winnielinnie · February 15, 2025, 9:43pm

I get a blank output file (.csv) from the script.

I ran it with:
./benchmarkdrive.sh $PWD 1000 20

This is the output it gave me, after it finished 20 iterations:

iteration,write (s),read (s)
1,,
2,,
3,,
4,,
5,,
6,,
7,,
8,,
9,,
10,,
11,,
12,,
13,,
14,,
15,,
16,,
17,,
18,,
19,,
20,,

With a second attempt, I confirmed it was indeed creating the 1-GiB test files in the working directory, as expected. The results of the .txt and .csv did not save any information.

EDIT: I’m on Core. That’s likely it. I should be using “gsed” instead of “sed”, since FreeBSD’s is different than GNU’s.

mr-valente · February 15, 2025, 9:49pm

Ah, I would not have thought about that! I did write and test this on SCALE - no experience with CORE.

winnielinnie · February 15, 2025, 10:02pm

I haven’t fully made this “Core compliant”, but I got it to work where I can see the numbers.

Immediately what drew my suspicion was the ARC.

In your results, the read speeds do not make sense for spinning harddrives. The blocks are being read purely in RAM.

To demonstrate this, redo your benchmarks with the following change to both datasets:

zfs set primarycache=none pool/dataset1
zfs set primarycache=none pool/dataset2

You’ll notice a huge difference for the read speeds.

If you want to see a “pure HDD” for the write speeds, also apply these changes:

zfs set sync=always pool/dataset1
zfs set sync=always pool/dataset2

I predict after applying the “ARC” and “sync” properties for both datasets, and then redoing the benchmarks, you’ll see something closer to what the true overhead of encryption is.

EDIT: If you want to simulate something closer to a home user’s common setup, you can leave the sync property to its default, since most of the time people are not using sync=always, and usually it’s over SMB anyways.

mr-valente · February 15, 2025, 10:06pm

Oh dang, that’s clever! I thought I’d avoided caching by setting the i/oflag=direct option for dd but I’m not proficient enough with zfs to have thought of what you suggested. Will give it another go when I get home, thanks!

mr-valente · February 16, 2025, 12:11am

Well hot dang, you were right! Results of new analysis (now brought to you by python) with cache disabled as you said:

— WRITE —
encrypted: Mean = 4.4558 s, Std Dev = 0.0277 s
unencrypted: Mean = 4.4799 s, Std Dev = 0.2596 s
Difference (encrypted - unencrypted): -0.0241 s
95% Confidence Interval: (-0.0759, 0.0277) s
Conclusion: No significant difference between the two datasets.

— READ —
encrypted: Mean = 16.0379 s, Std Dev = 0.5200 s
unencrypted: Mean = 15.9253 s, Std Dev = 0.6489 s
Difference (encrypted - unencrypted): 0.1126 s
95% Confidence Interval: (-0.0514, 0.2766) s
Conclusion: No significant difference between the two datasets.

Now the read speeds are atrocious - but at least equally so

winnielinnie · February 16, 2025, 12:12am

That’s what I figured.

Spinning HDDs are not fast enough to really feel the effects of encryption overhead when reading data.

As for reading (and decrypting) from ARC? RAM is so fast, and you’ve already bypassed the nonvolatile storage, you’re not likely going to notice any performance issues with encrypted datasets.

With modern CPUs and AES acceleration, most people won’t feel any slowdowns from ZFS encryption.

However, the other warnings and caveats about encryption still stand.

mr-valente · February 16, 2025, 12:22am

It’s funny because the original statistics are technically still valid - there was a statistically significant difference in read speed when using ARC, but it’s so fast to begin with that it’s not noticeable! Since the read times were small, the encryption overhead (which is also small) becomes a bigger proportion of the overall operation time, and registered as a significant difference. Stats are funny that way. Thanks for your help in getting to the bottom of this!

winnielinnie · February 16, 2025, 12:23am

Hey, don’t feel bad, @HoneyBadger. Sometimes we overlook things when creating custom synthetic benchmarks. Happens to the best of us.

/puts on glasses

Oh, that’s a cat in your avatar! You mean this entire time you were…

winnielinnie · February 16, 2025, 12:25am

Numbers and percentages, sure. But for a human? You rightly noted this too:

It reminds me of the early days of “upstart”, and how Linux users were becoming so obsessed with “faster boot times”.

I would hope someone’s happiness doesn’t balance on a matter of how many seconds were saved when they power on their PC. What are they going to do with that extra spare time? Stare at the desktop wallpaper as they think “What do I want to do on my computer today?”