Poor USB 3 performance writing to single-drive vdev

I’m using an external USB 3 drive dock for backups. I use bare 3.5” drives (kept in nice plastic boxes when not in use), drop them in the docking station for backups. (Among other issues, all the motherboard SATA ports are used for the data array, 6 disks.)

The write performance is fairly awful – around 30 MB/s. (With one particular drive I’ve been testing with, an old Seagate Barracuda 2TB, model ST2000DM001.)

Yes, I’m using a USB 3 port on my server box! (It’s marked as such, and various Linux tools report it as such.)

As a quick test, I took the dock over to my Windows laptop, reformatted the drive as NTFS, mounted it, and copied a big pile of photos over from the laptop’s SSD drive (presumably fast). It’s achieving 150MB/sec pretty often, never under 100. So, 3 to 5 times faster. Same dock, same cable, same drive.

So, is the ZFS write code (pool consisting of a single drive vdev), the Linux USB code, the hardware on the server’s motherboard, or something else I haven’t thought of the problem? At one level it may not matter, since I can’t fix the first 2 and probably can’t afford to fix the 3rd (motherboard). But I’m curious, and maybe there’s something I’m overlooking. Taking about 10 hours per terabyte to do backups is painful. (Yes, yes, incremental backups will be considerably faster once I get a set of full backups established for all the datasets.)

Anybody have any clues? Other information I can get my system to cough up that might tell us something? Is this a known problem with no solution?

How are you copying data? SMB, Rsync, ZFS Replication?

On the server, the actual backup system, I’m using zfs send/receive.

The main data array can sustain about 3.5 gigabits per second read (measured copying pools a while back, between 2 servers with 10Gbit ethernet). It’s a raidz2 array, 6 disks.

Isn’t that an smr model which’d likely have poor performance with zfs & sustained writes in general?

Seagate Barracuda ST2000DM001 2TB SMR

https://nascompares.com/answer/list-of-wd-cmr-and-smr-hard-drives-hdd/

2 Likes

It is in fact an old model, and it is in fact an SMR model, which will affect performance, especially writes.

But I compared performance of that exact drive, in the exact same USB dock, and got 5x faster writes on Windows (which is not famous for IO performance).

Old spare drives lying around are great for testing things :slight_smile: .

1 Like

ZFS and SMR just don’t get along. We see huge resilver times when users have any SMR drives, compared to CMR.
SMR vs CMR ServeTheHome

2 Likes

You are comparing Apples to Watermelons, there is a serious difference in the low level I/O code. In someways, SMR was designed for non-COW file systems, (Copy On Write).

Many disk manufacturers likely test their SATA drives against MS-Windows, desktop versions for consumer SATA drives. And MS-Windows server versions for Enterprise drives. They likely DON’T test ZFS for consumer drives, (see the Western Digital Red SMR disaster a few years back). They probably also don’t test for Enterprise drives with ZFS, letting the Enterprise computer & storage manufacturers do the testing with *nix variants.

PS: The Western Digital Red SMR disaster is when WD changed their WD Red NAS drives to SMR without testing on ZFS. Or letting anyone know. Today most traces of the old WD Red SMR drives has disappeared from WD’s website. Leaving only the WD Red Pro, (which was not affected), and the WD Red Plus, (which was the re-labeled WD Red CMR drives).

Agreed, but if you got a spare cmr I’d retest & compare with that too. Usb & zfs is already in the realm of suboptimal; toss smr into the mix & I’d argue it explains the results.

At worst, I’ll be wrong & you’ll get the same results, then we can go deeper into the rabbit hole.

Found another relevant item; this old dock isn’t supporting UAS. Things go moderately faster on a newer dock that does (that I also have kicking around), up in the 50-70MB/s range rather than 30 (but not up in the 120 range that Windows gets to). So the old dock is on the way to recycling, as is the even older USB 2 dock.

I’ll certainly make sure any new drives bought are CMR. SSD is out of the question in the size range I’m working; not industrial scale, but my data pool is 6x6TB, and the set of datasets on it can back up reasonably onto 2 6TB drives and one 4TB drive for now (obviously the pool is not full). Yeah, I could buy drives today that would back up the full pool onto one drive, but imagine how long that would take. Plus those huge drives are likely to be SMR, last I looked. Plus expensive.

Arguably one needs at least 3 backup sets, 2 to rotate locally plus one to be stored off-site and periodically swapped with one of the local one. You don’t ever want your only backup and the live disks online at the same time; too tempting a target for a lightning strike or whatever. Or a power supply fire, maybe. Anyway, let’s just never do that! Even if it’s a financial strain. (And I’ve got the existing backup drives in those sizes, with not many hours on them since they sit in the safe most of their life, so I can just get the new backup scheme running on the old drives.)

1 Like

Do you have many snapshots on your system? Some of them could refer to dataset which don’t have much data in the sense of larges files, but could have recurring changes such as logs and those would take a significant amount of time to replicate.

You could try the following:

  • Create a dataset on your backup drive.
  • Copy a coupe of large files using MC or cp via CLI.

what speed are you getting?

30 MB/s over USB 3 does seem too slow, especially if Windows gets much better speeds with the same drive. It could be ZFS sync settings or how TrueNAS handles USB drives, so you might want to check pool settings and ashift values. Sadly, there’s no quick fix like Lucky Patcher—storage tuning usually takes some testing.

1 Like

What’s a lot? Hourly automated snapshots with 2-week lifespan, but that should only be 336. Although ones previously replicated to backup pools don’t delete on schedule on the host, I don’t think. That’s on the list to look into and deal with, probably with a script that lists the snapshots, finds the automated ones, reads the timestamp and lifespan, and deletes the ones that shouldn’t still be there.

Hourly automated snapshots with 2-week lifespan, but that should only be 336.

To give you some context, I have a few iocage jails and some of the “root” datasets can reference 1-2GB of daily data and when I replicate them to my backup server over 1Gb ethernet, they can take more than 1hr each (I saw write speed in the 10MB/s). The replication is from a RAIDZ2 6 disk wide (18TB X18) native encrypted pool to a single 18TB X18 drive.

However, if I have datasets that contains large files, I can easily max out the 1Gb/s ethernet at about 980Mb/s.

Transfer is from Threadripper 1950X to Threadripper 2950X each with 128GB RAM.

1 Like

Random day-to-day doesn’t pile up much, but when I come back from an event or a trip with a pile of photos (or a video shoot), there can be a BIG bump, even hundreds of gigabytes. (People doing video professionally tend to need even heavier filer server support that I give myself.)