Truenas + Truecloud Backup Hella slow

I have been playing with Storj backups on Truenas using the Truecloud (restic) backup system. After a variety of issues, I finally completed a full backup of 1.2TB of data. First attempts yielded VERY slow full backups, but Storj must have changed something since the last 2 attempts gave me 1gbps throughput on backup (just about the max speed of my connection) Since first upload is less important than differential backups for me, I was less worried about the speed of first upload, however, when running subsequent differential backups, Truecloud is horrifically slow (Restic + Storj) compared to the same data set with Back Blaze and Cloud Sync.

File set 142218 files, 15 new files, 13 changed files, 1 directory new, 32 directories changed (mostly nextcloud)

Backblaze Cloud Sync backup duration: 43 seconds
Stojr Truecloud Backup duration 64 minutes!!!

The slow speed is very consistent across backup attempts and it always seems to grind on the nextcloud dataset (which has tons of small files).

If there was a way to pass additional paramenters to restic, I would try the --no-scan option, but I do not see a UI way to do it.

Anyone else experiencing these issues or have ideas on how to address. After working on this for nearly a week, I think this is restic with small files, but I want this to work if possible.

Thanks.

@HoneyBadger Did you see this one? ^ I talked to Bill at Storj and he indicated you all were going to work on fixing the lock issue on abort, but this one really impairs the usefulness of the restic implementation for Storj. I know I can set this up as traditional Cloud Sync, but then I lose restic and snap shots (the restic kind) Backblaze makes more sense there for me as I have been using it for years with good success.

As configured on the same data sets Truecloud is taking 4000+ seconds to run nightly (<100 file changes) and Cloudsync to Backblaze is taking 43 seconds to run (same number of file sync’d). Seems worth trying to fix to me.

Thanks.

@Theo Appreciate the ping - I’ll bring this back to the mutual Engineering teams. Thanks for including your stats re: file count in the job and sizing as well.

No worries, let me know if you need anything else. The slowness for small incremental uploads is consistently slow (I am sure it is the file count in the main dataset causing it to go slow). Bill at Storj has all my contact information as well.

@HoneyBadger I am also experiencing the same issues as @Theo . My first back up to storj was fairly quick but every subsequent one seem dog slow. I was speaking to Derek at storj and he indicated that it may be a issue with the number of file we have. He also stated that from there backend they are seeing the backup is only actually taking an hour and that it might be a Restic TruCloud issue that is making it seem like it is taking forever. Our datasets do have a large number of small file. We are not really able to control this as these files are made by the PLC and robot software we use for our customers to which we keep backups of. Here is some info about the datasets we have i calculated the times from the start and end time of the job logs

Dataset Size # of files Storj backup time BackBlaze backup time
Dataset 1 681.6 GB 816,790 3.75 hours 6 min
Dataset 2 32.6 GB 28,111 10 min 17 sec
Dataset 3 520.7 GB 364,489 1.5 hours 3 min
Dataset 4 7.8 GB 18,528 38 sec 30 sec
Dataset 5 1.8 TB 3,603,000 16 hours 30 min

Hopefully this additional data helps sort this out. until them I will probably be sticking with backblaze

@cpbpilot can you take a look during a backup at htop? I’m curious where the bottleneck is here, if we’re doing some heavy CPU operations to encrypt/dedup/compress in restic or something else?

@kris I sent you a private message with a link to a video. I didn’t want the video to be available to everyone. As far as I can tell the CPU usage is low. It has 6 cores and runs in the 3-16% range.

1 Like

@kris Same for me, a bunch of restic jobs using little cpu or memory. System is 16 core with 96GB of memory and it is basically sleeping while this is running.

@kris Sorry for the large photo dump but I also see weird internet traffic during the backup job. with LOT of little tiny spikes and this weird data out at 46th min of every hour. @Theo are you seeing weird interface traffic also?













Mine does about the same, this is the first hour of my upload.

Backblaze took 47 seconds to run, this has been running for an hour and will probably grind for 3 more. 2GB upload.

It seems to be unlikely its a Storj issue… you can validate that by doing a Cloud sync to Storj and comparing the time with Backblaze.

The TrueCloud backup (restic) software is doing a lot more work than a basic sync.

It would be useful to get the settings for the tasks.

Its also useful to get the logs:

There are two background tasks that can take time:

Check… this validates the backups and snapshots

Delete/prune… this removes unwanted snapshots

It would be useful if you can see what process is taking the time.

In particular I need to know which of these settings you’ve been using?

image

(Fast storage should max out what it does on the TrueNAS side)

@kris When I originally setup all of the backup task a few weeks ago I left everything as defaults so it was using the default setting. After seeing the slow performance changed everything to use the fast storage. All the numbers I gave to you are using the fast storage setting.

I have run settings at all 3 and none change the really long run times. I am currently running in fast storage mode, since that should be the fastest. (I have a very big NAS with very little activity at night and 2GB upload internet). When I first dug into this, I set up Storj as cloud sync and it worked a bit slower than backblaze, but by seconds. I am pretty sure the issue is related to restic and lots of small files. Here is my full job configuration:

Also, I found this is my digging, IIRC packsize is controlled by the transfer setting, but even fast storage does not allow for 60MiB, which should be ideal for Storj. :
Restic can become slow when handling many small files due to the overhead of checking each file’s metadata. Here are some suggestions to improve performance:

  • Use --no-scan Option : This option disables the file scanning that restic performs to estimate the progress, which can reduce overhead but also means you won’t see a progress estimate.
  • Increase Read Concurrency : You can increase the number of files read in parallel by setting the RESTIC_READ_CONCURRENCY environment variable or using the --read-concurrency option in the backup command. This can speed up the backup process when files are stored on fast storage like NVMe disks.
  • Adjust Pack Size : For very large repositories or fast upload connections, increasing the pack size can reduce the number of files in the repository and improve upload performance. This can be done using the --pack-size option or the RESTIC_PACK_SIZE environment variable.

Also, last night Truecloud ran for 107 minutes with 4 files changed and 1 file deleted. Cloudsync with BB took 41 seconds for the same changes.

My guess would be that pack size is irrelevant with such small changes…

The no-scan and read concurrency may be important. Can you test either?

I just submitted a ticket pre Kris request. please see below
https://ixsystems.atlassian.net/browse/NAS-134067

Agreed, though having an option for 60MiB would be ideal.

I am not sure it would be a valid test unless I ran it through the middleware client and I am not sure how I do thiat? If there is a command to pass that would allow me to use the additional parameters, I am happy to test it.