Linux users have a variety of options for handling data backup. The choices range from free and open-source programs to paid commercial tools, and include applications that are purely command-line based (CLI) and others that have a graphical interface (GUI), or both.
If you take a look at our Backblaze B2 Cloud Storage Integrations page, you will see a number of offerings that enable you to back up your Linux desktops and servers to Backblaze B2. These include CloudBerry, Duplicity, Duplicacy, 45 Drives, GoodSync, HashBackup, QNAP, Restic, and Rclone, plus other choices for NAS and hybrid uses.
Backing Up Linux: Old School vs. New School
We’re highlighting Duplicity and Restic because they exemplify two different philosophical approaches to data backup: “Old School” (Duplicity) vs “New School” (Restic).
Old School (Duplicity)
In the old school model, data is written sequentially to the storage medium. Once a section of data is recorded, new data is written starting where that section of data ends. It’s not possible to go back and change the data that’s already been written.
This old-school model has long been associated with the use of magnetic tape, a prime example of which is the LTO (Linear Tape-Open) standard. In this “write once” model, files are always appended to the end of the tape. If a file is modified and overwritten or removed from the volume, the associated tape blocks used are not freed up: they are simply marked as unavailable, and the used volume capacity is not recovered. Data is deleted and capacity recovered only if the whole tape is reformatted. As a Linux/Unix user, you undoubtedly are familiar with the TAR archive format, which is an acronym for Tape ARchive. TAR has been around since 1979 and was originally developed to write data to sequential I/O devices with no file system of their own.
It is from the use of tape that we get the full backup/incremental backup approach to backups. A backup sequence begins with a full backup of data. Each incremental backup contains what’s been changed since the last backup until the next full backup is made and the process starts over, filling more and more tape or whatever medium is being used. Recovering a system can be time consuming if there were a number of incremental updates following a full backup, as the full backup must be restored followed by each incremental to obtain the latest state of the data. Most customers will do frequent full-backups to avoid this.
This is the model used by Duplicity: full and incremental backups. Duplicity backs up files by producing encrypted, digitally signed, versioned, TAR-format volumes and uploading them to a remote location, including Backblaze B2 Cloud Storage. Released under the terms of the GNU General Public License (GPL), Duplicity is free software.
With Duplicity, the first archive is a complete (full) backup, and subsequent (incremental) backups only add differences from the latest full or incremental backup. Chains consisting of a full backup and a series of incremental backups can be recovered to the point in time that any of the incremental steps were taken. If any of the incremental backups are missing, then reconstructing a complete and current backup is much more difficult and sometimes impossible.
Duplicity help text (partial)
Duplicity is available under many Unix-like operating systems (such as Linux, BSD, and Mac OS X) and ships with many popular Linux distributions including Ubuntu, Debian, and Fedora. It also can be used with Windows under Cygwin.
We recently published a KB article on How to configure Backblaze B2 with Duplicity on Linux that demonstrates how to set up Duplicity with B2 and back up and restore a directory from Linux.
New School (Restic)
With the arrival of non-sequential storage medium, such as disk drives, and new ideas such as deduplication, comes the new school approach, which is used by Restic. Data can be written and changed anywhere on the storage medium. This efficiency comes largely through the use of deduplication. Deduplication is a process that eliminates redundant copies of data and reduces storage overhead. Data deduplication techniques ensure that only one unique instance of data is retained on storage media, greatly increasing storage efficiency and flexibility.
Restic is a recently available multi-platform command line backup software program that is designed to be fast, efficient, and secure. Restic supports a variety of backends for storing backups, including a local server, SFTP server, HTTP Rest server, and a number of cloud storage providers, including Backblaze B2.
Files are uploaded to a B2 bucket as deduplicated, encrypted chunks. Each time a backup runs, only changed data is backed up. On each backup run, a snapshot is created enabling restores to a specific date or time.
Restic assumes that the storage location for repository is shared, so it always encrypts the backed up data. This is in addition to any encryption and security from the storage provider.
Restic help text
Restic is open source and free software and licensed under the BSD 2-Clause License and actively developed on GitHub.
There’s a lot more you can do with Restic, including adding tags, mounting a repository locally, and scripting. To learn more, you can review the documentation at https://restic.readthedocs.io.
Coincidentally with this blog post, we published a KB article, How to configure Backblaze B2 with Restic on Linux, in which we show how to set up Restic for use with B2 and how to back up and restore a home directory from Linux to B2.
Which is Linux Backup Method is Right for You?
While Duplicity is a popular, widely-available, and useful program, many users of cloud storage such as B2 are moving to new-school solutions like Restic that take better advantage of the non-sequential access capabilities and speed of modern storage media used by cloud storage providers.
Tell us how you’re backing up Linux
Please let us know in the comments what you’re using for Linux backups, and if you have experience using Duplicity, Restic, or other Unix/Linux backup software with Backblaze B2.