Getting Started with NVMe over TCP

With the upcoming 25.10 TrueNAS release, I noticed that NVMe over TCP is now available in RC. It’s an interesting technology, so I decided to test it in my small homelab.

After some research, reading the documentation, torturing chatgpt and experimenting, I found that the basic setup is actually straightforward. However, securing it properly is more complex. I organized my notes and sample configurations here and also published them on GitHub, which may be more convenient to use than copying from the forum.

I hope this post eventually would be useful to someone. Feedback or corrections from more experienced users are very welcome.

Please also check official documentation https://www.truenas.com/docs/scale/25.10/scaletutorials/shares/nvme-of/ . It contains more details and UI screenshots for setup on the NAS side.

Disclaimer

  • This guide is provided without any guarantees, especially regarding security options.
  • The author is new to NVMe over Fabrics, VLANs, security and network namespaces and is exploring this field.

Software and Package Installation

The configuration was tested with the following software versions:

  • TrueNAS: 25.10 RC1 (first release with NVMe over TCP support)
  • Linux kernel: 6.14.0-33-generic (Ubuntu 24.04.1)
  • nvme-cli: 2.8 (libnvme 1.8)

Older versions of nvme-cli and libnvme may also work, but compatibility is not guaranteed.
Install the required packages on the client host:

sudo apt-get update
sudo apt-get install -y nvme-cli iproute2 net-tools

Basic Setup

This unsecured configuration is quick and simple to deploy. It is suitable for testing performance or use within a fully trusted, isolated network. For any production or less controlled environment, consider security measures and check the “Secured Setup” section.

  1. Ensure TrueNAS has a static IP. If unsure, check System → Network.
  2. Navigate to System → Services and enable the NVMe-oF service.
  3. Go to Sharing → NVMe-oF Subsystems → Add in the TrueNAS UI.
  4. In the wizard:
    • Provide a name.
    • Under Namespace, add the previously created Zvol, then save.
    • Click Next, Add, and select Create New.
    • Select your static IP.
    • Finalize the Add Subsystem wizard by clicking Save.
  5. Copy the NQN of the created namespace.
  6. Create an .env config file on the client host at /etc/nvme/nvme-basic.env and fill in the NAS IP and NQN from the previous step:
NVME_NAS_IP=
NVME_NAS_PORT=4420
NVME_NAS_NQN=

(You can also clone the repository and run sudo cp nvme-basic.env /etc/nvme/, then edit the file afterward.)

  1. Create a systemd service:
[Unit]
Description=Connect NVMe over TCP
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
EnvironmentFile=/etc/nvme/nvme-basic.env
ExecStartPre=/usr/bin/modprobe nvme-tcp
ExecStart=/usr/sbin/nvme connect -t tcp -a ${NVME_NAS_IP} -s ${NVME_NAS_PORT} -n ${NVME_NAS_NQN}
ExecStop=/usr/sbin/nvme disconnect -n ${NVME_NAS_NQN}
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

(Alternatively, copy the file using sudo cp nvme-basic.service /etc/systemd/system/.)

  1. Start the service and verify NVMe connectivity:
systemctl start nvme-basic
nvme list
nvme list-subsys /dev/{device_name}
  1. Enable the service for persistence after reboot:
systemctl enable nvme-basic

That’s all! You now have an NVMe device that behaves like a local one.

Secured Setup

Key points:

  • TrueNAS provides authorization keys for NoT, which are used here to restrict access to the specific client host.
  • All communication between the NAS and client is isolated within a dedicated VLAN.
  • To forbid non-root processes access to network traffic, the VLAN interface on the client is placed inside a dedicated network namespace.

Client Host Configuration and Service File

Generate the client NQN and copy the provided service file, auxiliary bash script, and .env file containing variables and secrets:

nvme-connect.service

[Unit]
Description=Connect NVMe over TCP in isolated namespace
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
EnvironmentFile=/etc/nvme/nvme-secret.env
ExecStartPre=/usr/local/sbin/nvme-netns-setup.sh
ExecStart=/usr/bin/ip netns exec ${NVME_NS} /usr/sbin/nvme connect \
    -t tcp -a ${NVME_NAS_IP} -s ${NVME_NAS_PORT} -n ${NVME_NAS_NQN} \
    --dhchap-secret=${NVME_CLIENT_KEY} \
    --dhchap-ctrl-secret=${NVME_NAS_KEY}
ExecStop=/usr/bin/ip netns exec ${NVME_NS} /usr/sbin/nvme disconnect -n ${NVME_NAS_NQN}
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

nvme-netns-setup.sh

#!/bin/bash
set -euo pipefail

/usr/bin/modprobe nvme-tcp

CONF=/etc/nvme/nvme-secret.env
[ -f "$CONF" ] && source "$CONF"

# Sanity checks
: "${NVME_NS:?Missing NVME_NS}"
: "${NVME_IF:?Missing NVME_IF}"
: "${NVME_CLIENT_IP:?Missing NVME_CLIENT_IP}"

# Create namespace if not exists
if ! ip netns list | grep -q "^${NVME_NS}\b"; then
    ip netns add "$NVME_NS"
fi

# Move VLAN interface to the namespace if not already there
if ip link show "$NVME_IF" 2>/dev/null | grep -q "$NVME_IF"; then
    ip link set "$NVME_IF" netns "$NVME_NS"
fi

# Configure inside namespace
ip netns exec "$NVME_NS" bash <<EOF
set -e
ip link set lo up
ip link set "$NVME_IF" up
ip addr show "$NVME_IF" | grep -q "$NVME_CLIENT_IP" || ip addr add "$NVME_CLIENT_IP"/24 dev "$NVME_IF"
ip route show default | grep -q "$NVME_GW" || ip route add default via "$NVME_GW"
EOF

nvme-secret.env

NVME_NS=nvme-network-namespace
NVME_IF=eth0.50
NVME_VLAN_ID=50
NVME_GW=192.168.50.1
NVME_CLIENT_IP=192.168.50.20
NVME_NAS_IP=192.168.50.10
NVME_NAS_PORT=4420

NVME_NAS_NQN=<GENERATED VALUE FROM TRUENAS>
NVME_CLIENT_KEY=<KEY FROM TRUENAS FOR CLIENT AS PRESENTED>
NVME_NAS_KEY=<KEY FROM TRUENAS IDENTIFYING IT AS PRESENTED>
nvme gen-hostnqn > /etc/nvme/hostnqn

cp ./nvme-netns-setup.sh /usr/local/sbin/nvme-netns-setup.sh
cp ./nvme-secret.env /etc/nvme/nvme-secret.env
cp ./nvme-connect.service /etc/systemd/system/nvme-connect.service

chmod 700 /usr/local/sbin/nvme-netns-setup.sh
chmod 600 /etc/nvme/nvme-secret.env

Adjust the network configuration (IP addresses, VLAN ID, etc.) in /etc/nvme/nvme-secret.env according to your network setup. The NAS NQN and security keys should be filled in after completing the TrueNAS configuration.

Client Host Network Setup

Stable Network Interface Name

Modern Linux systems use interface names based on firmware or hardware properties, which can differ from traditional ethX naming. To maintain consistency for configuration, explicitly assign a persistent name to the physical interface (e.g., eth0).

Create a udev rule file with the MAC address of the interface:

sudo vim /etc/udev/rules.d/70-persistent-net.rules

Add line like:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="aa:bb:cc:dd:ee:00", NAME="eth0"

Add a rule specifying the desired name and MAC address. Then reload udev rules and reboot:

sudo udevadm control --reload-rules
sudo reboot

VLAN Interface Creation

First we need to create a VLAN interface.
In my case NetworkManager managed networking, so using it to create a new interface.

nmcli connection add type vlan con-name "vlan${NVME_VLAN_ID}" dev eth0 id ${NVME_VLAN_ID} ip4 ${NVME_CLIENT_IP}/24

Otherwise please consult your distribution documentation.

TrueNAS NVMe-oF Setup

  1. Navigate to System → Network to configure VLAN and assign a static IP.
  2. Create a Zvol with the desired size and properties.
  3. Navigate to Sharing → NVMe-oF Subsystems → Add in the TrueNAS UI.
  4. In the wizard:
    • Provide a name for the NVMe share.
    • Under Namespace, add the previously created Zvol.
    • In Access Settings, untick “Allow any host to connect”.
    • To restrict access to the client host:
      1. Click Allowed Hosts → Add → Create New.
      2. Enter the host NQN (from /etc/nvme/hostnqn).
      3. Tick Require Host Authentication.
      4. Generate both keys and save them to the client /etc/nvme/nvme-secret.env.
    • To limit access to the dedicated VLAN, add the VLAN port.
  5. Save the configuration.
  6. If NVMe-oF Subsystems is not running, enable it when prompted.
  7. Notice generated NQN for “share” in the TrueNAS view and store it as NVME_NAS_NQ in /etc/nvme/nvme-secret.env.

[Optional] Check setup halfway

See network interface in dedicated namespace

sudo ip netns exec $NVME_NS ip a

Discover announced nvme namespaces

sudo ip netns exec nvme-ns nvme discover -t tcp -a $NVME_NAS_IP -s $NVME_NAS_PORT

Enable NVMe Connection Service

Start the service manually to test the connection and verify the NVMe device:

sudo systemctl start nvme-connect
sudo systemctl status nvme-connect
nvme list

Enable the service to start automatically at boot:

sudo systemctl enable nvme-connect

Filesystem Creation and Mounting

A newly connected NVMe device can be used like a local device in various ways. Here, it is shown being used for a single ext4 filesystem and mounted via an fstab entry.

  1. Identify the newly connected NVMe device using nvme list.
  2. Create an ext4 filesystem:
sudo mkfs.ext4 -E lazy_itable_init=0 /dev/{nvme dev name}
  1. Disable journaling, relying on ZFS on the NAS:
sudo tune2fs -O ^has_journal /dev/{nvme dev name}
  1. Create a mount point for the desired user:
mkdir -p ~/nvme_mount
  1. Retrieve the filesystem UUID:
blkid | grep {nvme dev name}
  1. Add an entry to /etc/fstab using the UUID by sudo vim /etc/fstab:
UUID={new fs uuid} /home/{user}/nvme_mount ext4 defaults,nofail,_netdev,user,auto,noatime,nodiratime,barrier=0 0 2
  1. Check mount as the user:
mount ~/nvme_mount

Finally try to reboot, nvme and fs should be mounted automatically.

2 Likes

Quick, but likely stupid questions.

Is NVMe over TCP just another share protocol like iSCSI, NFS or SMB?
Or specific only to NVMe devices without ZFS on TrueNAS?

(Well, more like iSCSI because NVMe over TCP seems to be for block devices…)

1 Like

Your github repository is unavailable. Is it perhaps private?

My bad! Changed to public.

Yes, it’s more similar to iscsi, since it expose block device.
It should be faster and provide lower latency, but i did not try to utilize iscsi before to compare.

1 Like

Thanks.

Next question:

Is NVMe over TCP able to multi-path, similar to iSCSI?

Bit confused, are we discussing on NVMe-oF or NVMe-TCP?

Before we jump into it, one should also consider the fact about what is the minimum n/w bandwidth that is necessary for using this over TCP(may be we will consider a Gen 3x4 m.2 drive as they are very common now a days). Isn’t it?

Fixed title.
As i understand NVMe-TCP is part of NVMe-oF standards. And community edition of TrueNAS currently provides only TCP version.

Yes. Most documentation suggests using high-speed networking (25 Gbps or higher) and a fast storage pool to see real benefits compared to iSCSI.
In my case, it was more about curiosity and finding a way to move a lot of VM images off my desktop storage. It also provides an easy method for daily VM backups using ZFS snapshot and replication tasks for the zvol.
And it seems to work fine even on measly 2.5Gb wire. Still I would not even try to do it wireless.

Thank you for the quick clarifications and update.

It would be interesting to know, if you are seeing any significant improvement over 2.5Gbe lan - using this setup. Have you tried any speed test to this particular disk over network?

It’s far out of my scope ( small playground in homelab), but google suggest it should be possible. And I’m not sure if multi-path supported by current TrueNAS implementation.
From documentation i suspect it might be a feature on enterprise edition.

I haven’t used iSCSI before, but now I’m curious to compare it. I’ll try setting it up as well and run some tests. Could you suggest a good way to measure performance?

I’ve only tried fio --name=randread --filename=testfile --ioengine=libaio --direct=1 --bs=4k --rw=randread --numjobs=1 --size=1G --runtime=10 --time_based --group_reporting and compared with samba

nvme over tcp

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.36
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=10.9MiB/s][r=2795 IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=34846: Thu Oct 16 11:48:18 2025
  read: IOPS=2764, BW=10.8MiB/s (11.3MB/s)(108MiB/10001msec)
    slat (usec): min=7, max=222, avg= 9.59, stdev= 2.46
    clat (usec): min=108, max=4034, avg=351.31, stdev=152.20
     lat (usec): min=119, max=4044, avg=360.90, stdev=152.24
    clat percentiles (usec):
     |  1.00th=[  124],  5.00th=[  128], 10.00th=[  178], 20.00th=[  194],
     | 30.00th=[  322], 40.00th=[  355], 50.00th=[  371], 60.00th=[  388],
     | 70.00th=[  400], 80.00th=[  412], 90.00th=[  537], 95.00th=[  545],
     | 99.00th=[  594], 99.50th=[  611], 99.90th=[ 2180], 99.95th=[ 2671],
     | 99.99th=[ 3687]
   bw (  KiB/s): min=10616, max=11576, per=100.00%, avg=11093.05, stdev=206.60, samples=19
   iops        : min= 2654, max= 2894, avg=2773.26, stdev=51.60, samples=19
  lat (usec)   : 250=25.30%, 500=61.83%, 750=12.56%, 1000=0.07%
  lat (msec)   : 2=0.12%, 4=0.12%, 10=0.01%
  cpu          : usr=0.44%, sys=3.45%, ctx=27745, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=27650,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=10.8MiB/s (11.3MB/s), 10.8MiB/s-10.8MiB/s (11.3MB/s-11.3MB/s), io=108MiB (113MB), run=10001-10001msec

Disk stats (read/write):
  nvme2n1: ios=27415/0, sectors=219320/0, merge=0/0, ticks=9519/0, in_queue=9519, util=95.39%

smb

randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.36
Starting 1 process
randread: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=6464KiB/s][r=1616 IOPS][eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=38128: Thu Oct 16 11:53:34 2025
  read: IOPS=1646, BW=6585KiB/s (6743kB/s)(64.3MiB/10001msec)
    slat (nsec): min=6720, max=85409, avg=11352.82, stdev=2470.40
    clat (usec): min=141, max=33597, avg=595.42, stdev=403.79
     lat (usec): min=152, max=33608, avg=606.77, stdev=403.76
    clat percentiles (usec):
     |  1.00th=[  182],  5.00th=[  330], 10.00th=[  367], 20.00th=[  529],
     | 30.00th=[  570], 40.00th=[  578], 50.00th=[  578], 60.00th=[  586],
     | 70.00th=[  586], 80.00th=[  603], 90.00th=[  668], 95.00th=[ 1106],
     | 99.00th=[ 1369], 99.50th=[ 1500], 99.90th=[ 2966], 99.95th=[ 5276],
     | 99.99th=[20579]
   bw (  KiB/s): min= 4944, max= 7368, per=99.83%, avg=6574.74, stdev=691.64, samples=19
   iops        : min= 1236, max= 1842, avg=1643.68, stdev=172.91, samples=19
  lat (usec)   : 250=1.34%, 500=16.70%, 750=75.04%, 1000=0.90%
  lat (msec)   : 2=5.75%, 4=0.22%, 10=0.04%, 20=0.01%, 50=0.01%
  cpu          : usr=0.18%, sys=2.77%, ctx=16517, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=16464,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=6585KiB/s (6743kB/s), 6585KiB/s-6585KiB/s (6743kB/s-6743kB/s), io=64.3MiB (67.4MB), run=10001-10001msec

One quick way to test would be to map it with iSCSI and then use CrystalDiskMark to check the speed (on windows)

Well, since I only use Windows for Steam these days, it wasn’t a quick task for me :).
But now I’ve got iSCSI running on both Linux and Windows. Unfortunately, NVMe over TCP seems to be limited to Windows Server or requires some non-trivial third-party software, so there’s no proper way to compare it on Windows for me currently. After spending a few hours, I decided to pause until upgrading to a faster network (probably 10G next year) and setting up at least a mirrored pair of fast NVMe drives as the test base.

I’d hide the test results under a spoiler, since I’m not confident they accurately reflect differences in the technologies. Instead, I’d summarize my observations, specific to a my setup and 2.5 Gbps network:

  • Both Linux implementations of NVMe and iSCSI, as well as Windows iSCSI, can saturate the link under suitable load.
  • On Linux, NVMe/TCP performs visibly better that iSCSI on writes.
  • On Linux, iSCSI shows slightly worse write performance compared to its reads, and similarly, Windows iSCSI writes are weaker than reads.
  • Increasing I/O depth and/or concurrent jobs generally improves throughput.
  • While CrystalDiskMark reported great numbers, I couldn’t reproduce those results when copying a 50 GB VMDK file to the iSCSI-mounted volume.
results table is here, but take it with a big grain of salt

READ

Test Type NVMe/TCP iSCSI iSCSI W11 CryMrk
MB/s IOPS MB/s IOPS MB/s IOPS
SEQ1M Q8T1 294 280 294 280 294 281
SEQ1M Q1T1 216 205 233 222 268 256
RND4K Q32T1 275 67k 233 56.9k 262 64k
RND4K Q1T1 9.8 2338 11.8 3033 26 6290
RND4K Q8T4 271 66k 267 64k 260 64k

WRITE

Test Type NVMe/TCP iSCSI iSCSI W11 CryMrk
MB/s IOPS MB/s IOPS MB/s IOPS
SEQ1M Q8T1 290 276 188 179 292 279
SEQ1M Q1T1 216 205 139 143 173 165
RND4K Q32T1 27 6606 13 3183 47 11.5k
RND4K Q1T1 11 2748 3 689 4.7 1146
RND4K Q8T4 91 22k 20 5252 52 12.7k
… 4k zvl rs 246 60k 200 49k
fio commands for reference
# SEQ1M Q8T1
fio  --name=seqread --rw=read  --filename=$NVMEoF --bs=1024k --numjobs=1 --iodepth=8 --ioengine=libaio --direct=1 --size=32G --group_reporting
fio  --name=seqread --rw=read  --filename=$ISCSI  --bs=1024k --numjobs=1 --iodepth=8 --ioengine=libaio --direct=1 --size=32G --group_reporting

# SEQ1M Q1T1
fio  --name=seqread --rw=read  --filename=$NVMEoF --bs=1024k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=32G --group_reporting
fio  --name=seqread --rw=read  --filename=$ISCSI  --bs=1024k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=32G --group_reporting

# RND4K Q32T1
fio  --name=randread --rw=randread  --filename=$NVMEoF --bs=4k --numjobs=1 --iodepth=32 --ioengine=libaio --direct=1 --size=32G --group_reporting
fio  --name=randread --rw=randread  --filename=$ISCSI  --bs=4k --numjobs=1 --iodepth=32 --ioengine=libaio --direct=1 --size=32G --group_reporting

# RND4K Q1T1 (just 2G)
fio  --name=randread --rw=randread  --filename=$NVMEoF --bs=4k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=2G --group_reporting
fio  --name=randread --rw=randread  --filename=$ISCSI  --bs=4k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=2G --group_reporting

fio  --name=randread --rw=randread  --filename=$NVMEoF4k --bs=4k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=2G --group_reporting

# RND4K Q8T4
fio  --name=randread --rw=randread  --filename=$NVMEoF --bs=4k --numjobs=4 --iodepth=8 --ioengine=libaio --direct=1 --size=8G --group_reporting
fio  --name=randread --rw=randread  --filename=$ISCSI  --bs=4k --numjobs=4 --iodepth=8 --ioengine=libaio --direct=1 --size=8G --group_reporting

## write
# SEQ1M Q8T1
fio  --name=seqwrite --rw=write  --filename=$NVMEoF --bs=1024k --numjobs=1 --iodepth=8 --ioengine=libaio --direct=1 --size=32G --group_reporting
fio  --name=seqwrite --rw=write  --filename=$ISCSI  --bs=1024k --numjobs=1 --iodepth=8 --ioengine=libaio --direct=1 --size=32G --group_reporting

# SEQ1M Q1T1
fio  --name=seqwrite --rw=write  --filename=$NVMEoF --bs=1024k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=32G --group_reporting
fio  --name=seqwrite --rw=write  --filename=$ISCSI  --bs=1024k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=32G --group_reporting

# RND4K Q32T1 (just 8Gg)
fio  --name=randwrite --rw=randwrite  --filename=$NVMEoF --bs=4k --numjobs=1 --iodepth=32 --ioengine=libaio --direct=1 --size=8G --group_reporting
fio  --name=randwrite --rw=randwrite  --filename=$ISCSI  --bs=4k --numjobs=1 --iodepth=32 --ioengine=libaio --direct=1 --size=8G --group_reporting
fio  --name=randwrite --rw=randwrite  --filename=$NVMEoF4k --bs=4k --numjobs=1 --iodepth=32 --ioengine=libaio --direct=1 --size=8G --group_reporting 

# RND4K Q1T1 (just 2G)
fio  --name=randwrite --rw=randwrite  --filename=$NVMEoF --bs=4k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=2G --group_reporting
fio  --name=randwrite --rw=randwrite  --filename=$ISCSI  --bs=4k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=2G --group_reporting

fio  --name=randwrite --rw=randwrite  --filename=$NVMEoF4k --bs=4k --numjobs=1 --iodepth=1 --ioengine=libaio --direct=1 --size=2G --group_reporting

# RND4K Q8T4 (just 2g)
fio  --name=randwrite --rw=randwrite  --filename=$NVMEoF --bs=4k --numjobs=4 --iodepth=8 --ioengine=libaio --direct=1 --size=2G --group_reporting
fio  --name=randwrite --rw=randwrite  --filename=$ISCSI  --bs=4k --numjobs=4 --iodepth=8 --ioengine=libaio --direct=1 --size=2G --group_reporting
fio  --name=randwrite --rw=randwrite  --filename=$NVMEoF4k --bs=4k --numjobs=4 --iodepth=8 --ioengine=libaio --direct=1 --size=2G --group_reporting
fio  --name=randwrite --rw=randwrite  --filename=$ISCSI4k  --bs=4k --numjobs=4 --iodepth=8 --ioengine=libaio --direct=1 --size=2G --group_reporting
2 Likes

Thank you for the efforts and summarizing the results. Even though our initial expectations were not met, the leanings from this were definitely worth it. Appreciated you taking the time for that, and doing it for all of us.

Just a suggestion from my experience: if you decide to go 10Gbe in future, if possible, stick with fiber version and skip Ethernet (especially any usb4/thunderbolt adapters) for avoiding heat related issues.

… and power consumption, with EU electricity cost :frowning:
New Realtek’s 8127 looks promising with 2W for RJ45, but now sfp+ options yet, and currently only available through aliexpress.

Thanks for this, @imvalgo . :slight_smile: I’ve been able to get an NVME over TCP disk connected to a VM, though I’m having some issues with getting it to auto-start on boot with just the Basic Setup.

In particular, this a Debian 13 VM (Proxmox) with two network interfaces–one of which is the storage network where TrueNAS lives. That interface doesn’t come up fast enough, and the launch unit fails.

I’m still trying to figure out the best way around this. I don’t run systemd-networkd to manage my interfaces, but apparently it can be used to force a service to wait for a specific interface before it starts even if it’s not being used to manage all the network services. I’m looking into that now.

Separately, there’s a race condition that can stop a device using NVME over TCP from shutting down if an automated nvme keep alive command is issued while the fabric is being shutdown. I ran into this a few minutes ago; the kernel crashed on shutdown and it just sat there for ten minutes looping an error until I reset it.

Apparently, at one point this was so bad that i resulted in a CVE:

Apparently, it should be fixed in the version of kernel I’m running but … is not? I had manually mounted an NVME target from TrueNAS. Maybe it crashed because I didn’t unmount it? In any case, be wary. :slight_smile:

For encryption, couldn’t you just use the host OS to encrypt the mounted storage?

To auto mount:

You would have to make the mount persistent by modifying the /etc/fstab using your favorite text editor.

Make a mount location
sudo mkdir /media/(nvme name you want)

Find the UUID
lsblk -o NAME,SIZE,FSTYPE,UUID,MOUNTPOINTS

Add this to the /etc/fstab
UUID=(from above) /media/(nvme name) (file ext) defaults,nofail 0 2

This might be a VM thing as I’ve not had issues using baremetal Ubuntu or Mint. Whatever the case, try creating an .sh that will disconnect the NVMe drive.

There are advantages of using NVMe-oF vs iSCSI, though your pool design may be a limitation cap.

Thanks for the additional info. I should have kept reading the secured part for more info on setting up the mounts and stuff. I stopped when I couldn’t get the basic setup to work. Oops. My fault for rushing. :slight_smile:

Agreed. There’s something wrong with that VM. At one point, SSH was failing because the ethernet interface it was bound to wasn’t coming up fast enough.

I’m going to have to start from scratch and verify that the things that should work out of the box do before I start trying to make things more complicated. :stuck_out_tongue:

EDIT: @Glitch01 @imvalgo I was thinking about it a bit more, and I’m curious. What client OSes are y’all using?

I just realized Debian 13, Ubuntu Server 24 LTS, and Ubuntu Server 25.10 all have different kernels and presumably different versions of nvme-cli in their repos. I’m going to stick with Debian for now, but I’m definitely curious what impact different kernels have on the performance and other behavior.

You can definitely tell NVME-oF is still relatively brand new compared to iSCSI. So many things to test and explore.

I’m running into some odd behavior when I try to format the disk.

@Glitch01 , did you use thick or thin (sparse) provisioning on the zVol?
I tried it with sparse, and I think that’s causing too much I/O delay since the space isn’t pre-allocated.

Needless to say, I don’t trust this and will be redoing it with a thick zVol. :stuck_out_tongue:

root@flynn:~# time mkfs.ext4 -v -E lazy_itable_init=0 /dev/nvme1n1
mke2fs 1.47.2 (1-Jan-2025)
fs_types for mke2fs.conf resolution: ‘ext4’, ‘big’
[  453.536169] nvme1c1n1: I/O Cmd(0x2) @ LBA 0, 128 blocks, I/O Error (sct 0x3 / sc 0x71)
[  453.536682] I/O error, dev nvme1c1n1, sector 0 op 0x0:(READ) flags 0x2080700 phys_seg 13 prio class 2
Discarding device blocks: done
warning: 16 blocks unused.

Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=16 blocks, Stripe width=8192 blocks
134742016 inodes, 1073741824 blocks
53687091 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=3221225472
32768 block groups
32768 blocks per group, 32768 fragments per group
4112 inodes per group
Filesystem UUID: d67acecd-5f14-4773-a631-5d7fee5a4ce9
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

root@flynn:~# fsck -V /dev/nvme1n1fsck from util-linux 2.41[/usr/sbin/fsck.ext4 (1) – /srv/NVMEoF/retronasData] fsck.ext4 /dev/nvme1n1e2fsck 1.47.2 (1-Jan-2025)/dev/nvme1n1: clean, 12/134742016 files, 8783389/1073741824 blocks

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

real    0m44.777s
user    0m0.031s
sys     0m0.058s

EDIT: I ran into the same behavior with a thick-provisioned zVol.

root@flynn:~# time mkfs.ext4 -v -E lazy_itable_init=0 /dev/nvme1n1
mke2fs 1.47.2 (1-Jan-2025)
fs_types for mke2fs.conf resolution: ‘ext4’, ‘big’
[ 1063.959747] nvme1c1n1: I/O Cmd(0x2) @ LBA 8589932672, 128 blocks, I/O Error (sct 0x3 / sc 0x71)
[ 1063.960310] I/O error, dev nvme1c1n1, sector 8589932672 op 0x0:(READ) flags 0x2080700 phys_seg 16 prio class 2
Discarding device blocks: done
warning: 16 blocks unused.

Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=16 blocks, Stripe width=8192 blocks
134742016 inodes, 1073741824 blocks
53687091 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=3221225472
32768 block groups
32768 blocks per group, 32768 fragments per group
4112 inodes per group
Filesystem UUID: c2a0e280-a012-4aaf-9684-ce04020a5a0e
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

real    0m45.199s
user    0m0.029s
sys     0m0.057s

EDIT 2: Okay, so those errors don’t seem to have actually amounted to real problems.

I’m passing fsck on the NVME-oF drive, and getting ~ 1 GB/s reads on hdparm. That’s exactly what I’d expect from a VM talking to TrueNAS at 10 Gbps, though I’m certainly pleased to see it with sync enabled on the underlying ZFS storage.

Suspicion: The guest Debian 13 VM is treating the disk like a real NVME disk, so I assume that fdisk, hdparm, mkfs, etc. are trying to do real low-level hardware NVME commands that the NVME-oF subsystem just won’t/cant process. They stall and time out and are apparently not critical failures and the system goes on with what it’s supposed to be doing.

@HoneyBadger , does that sound like a reasonable explanation for that behavior, or should I be worried about this?

Also, TRIM does not appear to be supported inside the guest VM, but I don’t think that’ really matters since the real zVol is on an HDD pool.

root@flynn:~# smartctl -a /dev/nvme1n1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.69+deb13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       TrueNAS DXP8800 Plus
Serial Number:                      237683a5018fd0f68c21
Firmware Version:                   25.10.1
PCI Vendor/Subsystem ID:            0x0000
IEEE OUI Identifier:                0x000000
Controller ID:                      2
NVMe Version:                       1.3
Number of Namespaces:               1024
Namespace 1 Size/Capacity:          4,398,046,576,640 [4.39 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Sat Feb 14 18:50:22 2026 CST
Firmware Updates (0x03):            1 Slot, Slot 1 R/O
Optional NVM Commands (0x002c):     DS_Mngmt Wr_Zero Resv
Log Page Attributes (0x07):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg
Namespace 1 Features (0x12):        NA_Fields NP_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +    25.00W       -        -    0  0  0  0       16       4

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        -
Available Spare:                    0%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    20,606 [10.5 GB]
Data Units Written:                 68,015 [34.8 GB]
Host Read Commands:                 16,735
Host Write Commands:                37,188
Controller Busy Time:               0
Power Cycles:                       0
Power On Hours:                     0
Unsafe Shutdowns:                   0
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, 16 of 128 entries)
No Errors Logged

Self-tests not supported

Here are the hdparm results. I’m still learning to use fio. :stuck_out_tongue:

johntdavis@flynn:~$ sudo hdparm -t /dev/nvme1n1

/dev/nvme1n1:
Timing buffered disk reads: 3196 MB in  3.03 seconds = 1055.93 MB/sec
johntdavis@flynn:~$ sudo hdparm -t /dev/nvme1n1

/dev/nvme1n1:
Timing buffered disk reads: 3196 MB in  3.02 seconds = 1057.83 MB/sec
johntdavis@flynn:~$ sudo hdparm -t /dev/nvme1n1

/dev/nvme1n1:
Timing buffered disk reads: 3196 MB in  3.01 seconds = 1063.12 MB/sec

Running the same test --directgives results closer to what I suspect the actual physical disks can do without the ARC intervening.

root@flynn:~# hdparm -t --direct /dev/nvme1n1

/dev/nvme1n1:
Timing O_DIRECT disk reads: 2444 MB in  3.00 seconds = 814.21 MB/sec
root@flynn:~# hdparm -t --direct /dev/nvme1n1

/dev/nvme1n1:
Timing O_DIRECT disk reads: 2402 MB in  3.00 seconds = 800.04 MB/sec
root@flynn:~# hdparm -t --direct /dev/nvme1n1

/dev/nvme1n1:
Timing O_DIRECT disk reads: 2476 MB in  3.00 seconds = 825.13 MB/sec
root@flynn:~# hdparm -t --direct /dev/nvme1n1

/dev/nvme1n1:
Timing O_DIRECT disk reads: 2454 MB in  3.00 seconds = 817.51 MB/sec

EDIT 3: @Glitch01 How did you decide to disable the ext4 write barriers (`barrier=0`)? I’d not seen that before; from my reading, it increases write I/O but increases the risk of write errors in the event of a power loss. I’ve left it as-is, since my Proxmox server and my NAS are both UPS-backed and supposed to shut down in after a power loss.

I’m curious if you noticed a big write I/O performance hit in your testing with barriers on.