What’s the Diff: Backup vs Archive

By | August 2nd, 2018

Whats the Diff: Backup vs Archive

Backups and archives serve different functions, yet it’s common to hear the terms used interchangeably in cloud storage. It’s important to understand the difference between the two to ensure that your data storage methodology meets your needs in a number of key areas:

  1. Retained for the period of time you require.
  2. Protected from loss or unauthorized access.
  3. Able to be restored or retrieved when needed.
  4. Structured or tagged to enable locating specific data.
  5. Kept current according to your requirements.

Our two choices can be broadly categorized:

  • Backup is for recovery from hardware failure or recent data corruption or loss
  • Archive is for space management and long term retention

What Is a Backup?

A backup is a copy of your data that is made to protect against loss of that data. Typically, backups are made on a regular basis according to a time schedule or when the original data changes. The original data is not deleted, but older backups are often deleted in favor of newer backups.

Data backup graphic

Desktop computers, servers, VMs, and mobile devices are all commonly backed up. Backups can include data, OS and application files, or a combination of these according to the backup methodology and purpose.

The goal of a backup is to make a copy of anything in current use that can’t afford to be lost. A backup of a desktop or mobile device might include just the user data so that a previous version of a file can be recovered if necessary. On these types of devices an assumption is often made that the OS and applications can easily be restored from original sources if necessary (and/or that restoring an OS to a new device could lead to significant corruption issues). In a virtual server environment, a backup could include .VMDK files that contain data and the OS as well as both structured (database) and unstructured data (files) so that the system can be put back into service as quickly as possible if something happens to the original VM in a VMware, Hyper-V, or other virtual machine environment.

In the case of a ransomware attack, a solid backup strategy can mean the difference between being able to restore a compromised system and having to pay a ransom in the vague hopes of getting a decryption key to obtain access to files that are no longer available because they were encrypted by the attacker.

Backups can have additional uses. A user might go to a backup to retrieve an earlier version of a file because it contains something no longer in the current file, or, as is possible with some backup services such as Backblaze Backup, to share a file with a colleague or other person.

What Is an Archive?

An archive is a copy of data made for long-term storage and reference. The original data may or may not be deleted from the source system after the archive copy is made and stored, though it is common for the archive to be the only copy of the data.

Data archive graphic

In contrast to a backup whose purpose is to be able to return a computer or file system to a state it existed in previously, an archive can have multiple purposes. An archive can provide an individual or organization with a permanent record of important papers, legal documents, correspondence, and other matters. Often, an archive is used to meet information retention requirements for corporations and businesses. If a dispute or inquiry arises about a business practice, contract, financial transaction, or employee, the records pertaining to that subject can be obtained from the archive.

An archive is frequently used to ease the burden on faster and more frequently accessed data storage systems. Older data that is unlikely to be needed often is put on systems that don’t need to have the speed and accessibility of systems that contain data still in use. Archival storage systems are usually less expensive, as well, so a strong motivation is to save money on data storage.

Archives are often created based on the age of the data or whether the project the data belongs to is still active. An archiving program might send data to an archive if it hasn’t been accessed in a specified amount of time, when it has reached a certain age, if a person is no longer with the organization, or the files have been marked for storage because the project has been completed or closed.

Archives also can be created using metadata describing the project. An archiving program can automatically add relevant metadata, or the user can tag data manually to aid in future retrieval. Common metadata added can be business information describing the data, or in the case of photos and videos, the equipment, camera settings, and geographical location where the media was created. Artificial intelligence (AI) can be used to identify and catalog subject matter in some data such as photos and videos to make it easier to find the data at a later date. AI tools will become increasingly important as we archive more data and need to be able to find it based on parameters that might not be known at the time the data was archived.

What’s the Diff?

Backup Archive
Data backup graphic Data archive graphic
Enables rapid recovery of live, changing data Stores unchanging data no longer in use but must be retained
One of multiple copies of data Usually only remaining copy of data
Restore speed: crucial Retrieval speed: not crucial
Short Term Retention
Retained for as long as data is in active use
Long Term Retention
Retained for required period or indefinitely
Duplicate copies are periodically overwritten Data cannot be altered or deleted

What’s the Difference Between Restore and Retrieve?

In general backup systems restore and archive systems retrieve. The tools needed to perform these functions are different.

If you are interested in restoring something from a backup, it usually is a single file, a server, or structured data such as a database that needs to be restored to a specific point in time. You need to know a lot about the data, such as where it was located when it was backed up, the database or folder it was in, the name of the file, when it was backed up, and so forth.

When you retrieve data from an archive, the data is connected in some manner, such as date, email recipient, period of time, or other set of parameters that can be specified in a search. A typical retrieval query might be to obtain all files related to a project name, or all emails sent by a person during a specific period of time.

Trying to use a backup for an archive can present problems. You would need to keep rigorous records of where and when the files were backed up, what medium they were backed up to, and myriad other pieces of information that would need to be recorded at the time of backup. By definition, backup systems keep copies of data currently in use, so maintaining backups for lengthy periods of time go beyond the capabilities of backup systems and would require manual management.

The bottom line is don’t use a backup for an archive. Select the approach that suits your needs: a backup to keep additional copies of data currently in use in case something happens to your primary copy, or an archive to keep a permanent (and perhaps only record) of important data you wish to retain for personal, business, or legal reasons.

Why You Need Both Backup and Archive

It’s clear the a backup and an archive have different uses. Do you need both?

Data backup graphic & Data archive graphic

If you’re a business, the wise choice is yes. You need to make sure that your active business data is protected from accidental or malicious loss, and that your important records are maintained as long as necessary for business and legal reasons. If you are an individual or a small business with documents, photos, videos, and other media, you also need both backup and archive to ensure that your data is protected both short and long term and available and retrievable when you need it.

Backblaze for Backup and Archiving

In the Backblaze product line, Backblaze Personal Backup and Backblaze Business Backup are for flat fee, unlimited backing up of Windows and Macintosh computers. Backblaze B2 Cloud Storage is pay-per-GB, general purpose object cloud storage. It is ideal for archiving, though it also can be used for backing up servers, VMs, NAS, Linux, Macs and PCs, and for storing general object data using one of the many integrations available from Backblaze’s partners. See our pricing page for details about B2 costs.

Selecting the right tools and services for backup and archiving is essential. Each have feature sets that make them suited to their tasks. Trying to use backup for archiving, or archiving for backup, is like trying to fit a round peg into a square hole. It’s best to use the right tool and service for the data storage function you require.

Roderick Bauer

Roderick Bauer

Content Director at Backblaze

Roderick has held marketing, engineering, and product management positions with Adobe, Microsoft, Autodesk, and several startups. He's consulted to Apple, Microsoft, Hewlett-Packard, Stanford University, Dell, the Pentagon, and the White House. He was a Ford-Mozilla Fellow in Media and Democracy with Common Cause in Washington, D.C., where he advocated for a free, open, and accessible internet for all, reducing media consolidation, and transparency in politics and the media.

He is Content Director for Backblaze.

Follow Roderick on:
Twitter: @rodbauer | LinkedIn | Medium | Flickr | SmugMug
Category: Backing Up · What's the Diff?