Using LTO Tapes to backup your TrueNAS?!

Preface:

I typically start these types of articles with a definition of the title and some explanation.

Today I am going to break that trend, and jump into an anecdote from my professional life. Just about 5 years ago, I started as the Datacenter Operations Manager for a large public school district. As part of my “orientation” my director went through several key systems, and one of them was our backup strategy. At that time, we had two datacenters, each with warm backups of the other’s virtual machines. A sane strategy. When I asked about offsite, he showed me a tape library.

Now I want you to picture this in no uncertain terms. I was 27 years old at the time, and the only tapes I was familiar up until that point were cassette tapes and VHS tapes. I literally had not even heard the term LTO Tapes. For years, I badgered my director about my frustrations with this strategy. I had to make sure one of my guys was there every Friday to hand the tapes off to a man with a case. I had to make sure we called back tapes when they expired, else risk running out of tapes and not being able to write out weekly rotations. We even occasionally had weird problems I didn’t understand and couldn’t fix. It drove me nuts!

But one day I had an epiphany, it was something that my director had said that caused an explosion of neurons to fire. Just because it was an electromechanical device with roots going back before the dawn of the millennium, shouldn’t have meant it was invalid. We in the tech field are obsessed with rapid growth and change. What I realized that day was that sometimes, what’s old is new again. Tried-and-true technology, stability, predictability, these are all characterizations of tape backup strategies.

What is LTO?

Wikipedia’s definition: Linear Tape-Open - Wikipedia

Linear Tape-Open (LTO) is a magnetic tape data storage technology originally developed in the late 1990s as an open standards alternative to the proprietary magnetic tape formats that were available at the time. Hewlett Packard Enterprise, IBM, and Quantum control the LTO Consortium, which directs development and manages licensing and certification of media and mechanism manufacturers.

So it’s an old, dead format right? Like VHS or Betamax before it? Well, no. It’s actually a fairly regularly updated standard, with new releases every few years, and the most recent one being from 2021. Not to mention they have a pretty firm roadmap going into the future:

What is a Tape Drive?

A tape drive is a device that can read or write, in this context, LTO tapes. They have come in several varieties over the years. They can be internal to a system in a 5.25” bay like a CD drive:

image

They can sit on your desk:

image

Think about a tape drive like you would think about a VHS tape, except for data instead of movies.

Or they can live in a library:

What is a Library you might ask? If you are a 90s kid like me, you can think about it like one of those cool CD changers from back in the day, where you could put all of your CDs in one unit in a console:

How might I use it?

Ahh! Now that’s the question. For my purposes I am using an HP 1/8 G2 Tape Autoloader. It’s the same as the library pictured above. The 1 in the name means 1 drive lives in the library (some can have two or more), and the 8 means it holds 8 tapes. Autoloader means, much like the name implies, it can automatically load tapes into the drive for you!

HP helpful has as a little LCD screen on the front of the library that lets you setup basic networking configuration, and now we can access the library from the interwebs.

You can see I have 7 slots free out of 7 slots. Huh? I thought it was 8! That’s because there is one slot which is called a mail slot.

By default, you can take the entire magazine out of the thing:

In order to remove a tape. When you are swapping out multiple tapes, this is fine. But if you only need 1 tape, it’s kind of a lot of work. With a mailsot, things are simpler and you can just remove a single tape

The library will automatically, with a little robot, move your tapes around for you at your request. Here, I just commanded it to move the tape currently in the drive into the mailslot, as an example:

Getting ready:

On my production TrueNAS server, I have already shipped some of my data over to my friend “Little Lenny”, which is a backup server instance I have. I now effectively have 2 copies of my data, but while it is on a different system, it is still in the same house as my primary production. The tape will help me solve that problem.

In this case, Little Lenny has a SCSI connection directly into my tape library using an 8087 SAS cable. This is an example picture of what the drive inside of my library looks like:

image

The 8087 cable plugs into a SAS HBA, like a LSI SAS9207-8e, just like if I were to connect a disk shelf.

image

Now in Little Lenny, I have my dataset called pictures which has precious family moments I would rather not lose.

So, I have my data, and I have the library connected. How do I do that?

Well, we can use a whole slew of backup software utilities, some are free, some cost a lot of money, some come with support, and some don’t. But it isn’t even that complicated. We can use standard Linux commands that natively exist without having to do anything at all. The same is true for TrueNAS Core, though the commands are a bit different.

Lets go over to the shell. Type ‘lsscsi’

Nice! My tape library is detected, and it’s at /dev/st0
Now we can use the Linux command ‘mt’ to interact with it.
Let’s type:

mt -f /dev/st0 status
And for me it looks like it can see the tape and interact with it, we are ready to move on:

image

Backing up my data

Let’s talk a little bit about .tar, I’m sure if you are reading this guide, at one point or another you have encountered a .tar file, maybe the phrase ‘tarball’ or perhaps a .tar.gz file. To quickly get this out of the way, .tar.gz is simply a tar file that compressed. But a tar itself, was created SPECIFICALLY for the purpose of doing what we are doing here, to take files and create an archive, on tape. TAR the letters themselves, is suspiciously similar Tape ARchive, is it not?

With that out of the way, lets get some baseline performance numbers in an ideal sequential workload of 0s.

Code:

root@littlelenny[~]# dd if=/dev/zero of=/dev/st0 bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 36.4856 s, 287 MB/s

That’s faster than 2.5 Gigabit network adapters, so that’s pretty good!

I am going to make a tar ball of my dataset /mnt/backup/pictures and push it over to the tape drive, like this:

Code:

tar -cvf /dev/st0 /mnt/backup/pictures .

Personal preference here, but I prefer to use tar in conjunction with dd as a separate stage to perform this task:

Code:

tar -cvf /mnt/backup/pictures.tar -C /mnt/backup pictures && dd if=/mnt/backup/pictures.tar of=/dev/st0 bs=1M

How do I restore my data, then?

Code:

dd if=/dev/st0 of=/mnt/backup/pictures_backup.tar bs=1M

And then extract it:

Code:

tar -C /mnt/backup/picture_backup -xvf pictures_backup.tar

Like this as a combined statement:

Code:

dd if=/dev/st0 bs=1M | tar -C /mnt/backup/picture_backup -xvf -

Theoretically you can pipe those two things together like we did above during the restore process as well.
Here's a script I wrote to automate this process via CRON:

Code:

#!/bin/bash # Define paths and filenames BACKUP_SOURCE="/mnt/backup/pictures" DATE=$(date +%Y%m%d%H%M%S) BACKUP_DEST="/mnt/backup/pictures_backup_${DATE}.tar" LOG_FILE="/root/pictures_${DATE}_tar_output.log" TAPE_DEVICE="/dev/st0" # Create tar file and log the output tar -cvf ${BACKUP_DEST} -C ${BACKUP_SOURCE} . > ${LOG_FILE} 2>&1 if [ $? -ne 0 ]; then echo "Error: Failed to create tar file." exit 1 fi # Write the tar file to the tape drive dd if=${BACKUP_DEST} of=${TAPE_DEVICE} bs=1M status=progress if [ $? -ne 0 ]; then echo "Error: Failed to write the tar file to the tape drive." exit 1 fi # Advance the tape to the end of file (EOF) mt -f ${TAPE_DEVICE} eof if [ $? -ne 0 ]; then echo "Error: Failed to advance the tape." exit 1 fi

The above commands were written for SCALE, modifications would likely have to be made for CORE.

Scaling Past Single tapes

Obviously, This methodology would work fine if you are only backing up enough to not fill your tape. For me, that’s under 2.5TiB on my LTO6 tapes. Once you get to that point, it becomes impractical to backup your data in this way. That’s where backup software comes in.

Bacula is a great open source tool you can run in a VM, and then use NFS to share your datasets (preserving xattrs!) to be backed up/
The best open source backup software for Linux. (bacula.org)

If you prefer more commonly used Enterprise software, Veeam is free for home users, and you can backup your data to a VM using SMB.
Veeam Backup & Replication Community Edition:Our latest gift to the community

Why Not the Cloud?

I believe in owning my data. If I have an emergency and I need to recover my data, I do not want to be ransomed by the people in the cloud with whom I have entrusted my data. I have a moral objection to egress fees because if things are bad enough that I have to recover from 3rd tier backups, then adding fuel to the fire and having an additional cost makes me mad.

Backblaze is well respected and is generally cheaper than it’s bigger brand competition.

Let’s use the 2.5TiB as our number, which is the maximum I can store on a single tape. Let’s scale out the timeline, and there is no guarantee that pricing won’t increase over time.

Backblaze will cost about $12.50 for the month or about $150 for a year.
That’s $300 at year 2.
That’s $750 at year 5.

Now let’s assume I have to retrieve my data at least once. Add another $25.
We’re at $775.

Now lets assume I have 8 times that amount of data I would like to backup, which corresponds to the number of tapes I can store in my library.
With about 20 TiB of data, Backblaze will cost me about $6,000 over 5 years, and about $200 to retrieve my data only once.

How much did I pay for my tape library, drive, and tapes? $300 for the library, $350 for the drive and $160 for the tapes or about $810. If we assume 20TiB of data, I break even at 8 months of Backblaze, and I should get about 5 years out of this drive. The tapes themselves will last for 30 years if stored properly.

iXsystems uses the tagline “True Data Freedom”. I think that tapes have a part to play in data freedom, but thats just my opinion.

7 Likes

All good points, but I still hate tape.

I had to spend days at a remote site reviving a pair of tape drives. As it turns out, the big-ass HP 6U library they were in had a dead PSU and a dying one that likely caused the drives to stop working, but the drives themselves were probably fine… when the thing was shelved a decade ago. And, as it turns out, bearings do not like to live next to the ocean, even if it is inside a climate-controlled room. Fairly easy fix, conceptually, and no recalibration would be necessary.
Fast forward to December 2022 and I need to read out a giant stack of LTO2 and LTO3 tapes in hopes of finding some old satellite images we’d been asked to keep safe (don’t blame me, it was before my time!). So, troubleshooting revealed the stuck heads. So I had to sit there gently lubricating the six bearings in each of the drives until the heads could move freely again. The only entertainment I had was to bore two groups of high school kids on a work experience thing. “You guys know what VHS is, right?” I asked, dreading the answer, but it was even worse “Oh yeah, my grandfather has one of those!”. I also managed to break the ferrite stick on one the drives, which the head uses together with a coil to sense its current position.
After much fiddling with the tape library and drives, and after getting the heads moving again, out of sheer desperation, I hack together a setup with an external PSU and the drive’s caddy acting as a SCSI (yeah, parallel SCSI) adapter to convert the drive’s connector to the only cables we had on hand. Magically, the drives started working again!

5 Likes

I forgot the punchline, 6 of like 30 tapes had any data on them and only two or three of those were relevant.

2 Likes

The worst part about this is that those kids are in their 20s now…I had a whole long conversation with a kid who was born after 9/11 and in college…

At least the drive had few enough hours to run…lol

I have colleagues at work who were born in 2000-something. Feels wild. My paternal grandfather was quite literally born in the 19th century.
Though I’m pretty sure the work experience kids aren’t 20 yet, it’s only been a year and a half.

“Are all these tapes empty or did we do something wrong?”
“Look, this one is reading more!”
“So I guess the other ones were empty?”

I’m sort of glad they were mostly empty, because reading out a handful of tapes was painful enough. And we did it with the drive cover still off when we started, so we were following along as the tape spooled up, excited at the blazing speed at which the tape was being fed through the transport…

…then the drive went into reverse…

…and then forward again…

…then Wikipedia told us these tapes have a crapton of tracks and need to be read in a million passes.

3 Likes

I like the idea in principle, but it really doesn’t seem to make a lot of sense:

  • Tape drives are expensive, even if you buy used.
  • Tapes don’t have enough data capacity
    • …and they lie about what they do have. A LTO6 tape holds 6.25 TB–if you assume the data can be compressed 2.5:1. I don’t have terabytes of text files, and ZFS is already compressing what can be compressed.
  • Due to the above, you really need a tape library if you want to back up a serious amount of data. More money, and now you need backup software.

For the cost of the used LTO8 library/drive that would be needed to back up my server (call it 100 TB of relatively-uncompressible data), I could buy another equivalent server, fill it with drives, and have thousands (yes, plural) of dollars left over. Of course, it’d draw a lot more power, but the cost delta would pay for a lot of watt-hours.

2 Likes

Plus, the performance delta is immense. I lucked out and most tapes I needed to read took like 3 minutes to swap, try to read, rewind and try again, and eject - because they were empty.
I don’t remember how long it took me to read the tapes that were full, but it’s clear that it would have been a multi-day process if all the tapes were full. And those were puny LTO2/3 tapes, the newer ones take much longer, as I understand it. And good luck if you need random access.

And tapes aren’t cheap. You’d need massive scale to break even on just cost of hardware and consumables versus spinning rust, and you’d be left with all the disadvantages of tapes.

My take on it is: Play around with for fun if it’s free, don’t bother otherwise.

1 Like

Some of the drives I’m finding are cheap–but with a Fiber Channel interface. Too bad TrueNAS doesn’t support that any more.

Again, I kind of like the idea. It would have been nice to have been able to take tapes with me when evacuating from a hurricane, rather than packing a laundry basket full of hard drives and hoping I didn’t jar them too much. It’d be nice to have a data backup that I can reasonably expect to last for decades (assuming I can find something operable to read it then, along with software to handle it). It’d be nice to have a data backup that’s truly offline. But would it be that nice?

1 Like

Well, those are mostly SAS drives with a bridge to FC in the outer chassis, the 5.25" drive inside being usable in a tape library or directly in a server - firmware allowing all around.

The ultimate killer for me is the need for a library, it’s just a bridge too far for me.

Which is pretty much the approach I use.

An offline secure copy of your data is not the same thing as a second online copy on disks. Tape serves as an archival role, it’s not meant for random access.

I have about 100 tib of data. I have all of it mirrored on two different servers. Having the tape as an offline copy of my most important data…about 10 tib (read:irreplaceable as in family home videos and pictures) is part of a disaster recovery plan, not my regular backup rotation.

There is a difference

Sure, that’s valid, but I think our point is that, given the choice between an offline archive on tape or an online one on spinning rust, tape has an uphill struggle.