By 2025, the world will generate 463 exabytes of data every day. That’s more or less as much data as one million storage pods. This alone should underscore why being savvy about your online data storage will only become more important in coming years.
To the uninitiated, archiving may sound like another form of data storage: by backing up your files, you are also archiving them. Surprisingly, that’s not the case. Here’s the short story on the difference between these two concepts.
Backups provide for recovering data from hardware failure, data corruption, or other loss.
Archive helps you manage space limitations and long-term data retention.
So, if you are eyeing your own ever-expanding data footprint and wondering where and how to securely store it all, we have a few things to tell you about the difference between archiving and backup functions.
In this blog post, we cover how each of these data storage methods help to ensure that data is:
- Retained for the period you require.
- Protected from loss or unauthorized access.
- Able to be restored or retrieved when needed.
- Structured or tagged for locating specific data.
- Kept current according to your requirements.
Why Back Up Your Data?
The goal of a backup is to make a copy of any file that you currently use and cannot afford to lose. Typically, backups are made regularly or when the original data changes. The original file is preserved, while older backups (iterations) are deleted in favor of newer backups.
Any machine that stores valuable data—like computers, servers, VMs, and mobile devices—should be backed up. Backups can include data, operating systems (OS), and application files, or a combination of these depending on your backup approach.
A backup of a desktop or mobile device might include just the user data so that a previous version of a file is recoverable if necessary, while the OS and applications can quickly be restored from original sources if necessary (although you should know that restoring an OS to a new device could lead to significant corruption issues).
In a virtual server environment, a backup could include
.VMDK files that contain data and the OS as well as both structured (database) and unstructured data (files). This way, the system can be put back into service as quickly as possible if something happens to the original VM in a VMware, Hyper-V, or another virtual machine environment.
Get Your Guide to Server Backup
There are lots of things to consider when you’re creating a solid server backup strategy. Use our guide to learn how to safeguard your server data with backups that preserve your information in case of disaster.
In the case of a ransomware attack, a solid backup strategy is critical for restoring a compromised system rather than paying a ransom in hopes of getting a decryption key to obtain access to your own files (and we do mean hopes because decryption keys aren’t always delivered after ransoms are paid, and even when they are, they don’t always work).
Backups can have other uses, too. You can retrieve an earlier file version because it contains something no longer in the current file or, as is possible with some backup services, share that specific version of that file with a colleague or client.
What Is an Archive?
An archive is also a copy of data specifically made for long-term storage and reference. The original data may or may not be deleted from the source system after the archive copy is made and stored, though it’s common for the archive to be the only copy of the data.
In contrast to a backup, whose purpose is to be able to return a computer or file system to a state it existed in previously, an archive can have multiple purposes.
For those with requirements for easily searching through volumes of media, an archive provides simple queries through metadata attached to each file, which can be applied manually or using AI. For some businesses, an archive provides a permanent record of legal documents, film, photos, directories and more to satisfy the information retention and deletion compliance required for HIPAA, SSAE-18/SOC 2 data centers and service level agreements (SLAs).
An archive is frequently used to ease the burden on faster and more frequently accessed data storage systems. In addition, archival storage systems are usually less expensive, creating a strong motivation to move historical files elsewhere to save money on data storage.
Archives are often created based on these criteria:
- The age of the data.
- The amount of time since data was last accessed.
- Whether or not the main user is still with the organization.
- Whether the associated project has been completed or closed.
The structure of an archive is important for retrieval. Archives can use metadata describing the project and can automatically add relevant metadata, or the user can tag data manually for easier retrieval. Common metadata can include business information describing the data, or in the case of photos and videos, the equipment, camera settings, and geographical location where the media was created.
Artificial intelligence (AI) can identify and catalog subject matter in data such as photos and videos to make it easier to find. AI tools will become increasingly important as growing businesses archive more data and need to be able to find it based on parameters that might not be known at the time the data was archived.
|Enables rapid recovery of live, changing data||Stores unchanging data no longer in use but must be retained|
|One of multiple copies of data||Usually only remaining copy of data|
|Restore speed: crucial||Retrieval speed: not crucial|
|Short Term Retention||Long Term Retention|
|Retained for as long as data is in active use||Retained for a required period or indefinitely|
|Duplicate copies are periodically overwritten||Data cannot be altered or deleted|
What’s the Difference Between Restore and Retrieve?
In general, backup systems restore, and archive systems retrieve. The tools needed to perform these functions are different.
If you are interested in restoring something from a backup, it usually is a single file, a server, or structured data such as a database that needs to be restored to a specific point in time. You need to know a lot about the data:
- The last backup date.
- The database or folder.
- The filename.
- Its date.
- Data type.
- Owner’s name.
When you retrieve data from an archive, the data is connected in some manner, such as date, email recipient, period of time, or another set of parameters that can be specified in a search. A typical retrieval query might be to obtain all files related to all emails sent by a person during a specific date range. It can seem like searching for a needle in a haystack, but in this case you at least know approximately where in the haystack you dropped this specific needle.
Retrieving a backup would be like searching for a pin that has changed over time in a haystack. You’d need to keep rigorous records of where and when the files were backed up, what medium they were backed up to, and myriad other pieces of information that would need to be recorded at the time of backup.
By definition, backup systems keep copies of data currently in use, so maintaining backups for lengthy periods of time goes beyond the capabilities of backup systems and would require manual management.
The bottom line is don’t use a backup for an archive.
Why You Need Both Backup and Archive
It’s clear that a backup and an archive serve different purposes. Do you need both?
If you’re a storage-heavy business, the wise choice is yes. Consider the business reasons for choosing both data storage methods:
- Ease of Use. Reliable, remote access, and secure backup and archive make it easier to provide archived data or help you to overcome issues and meet the budget and timeline your clients desire.
- Resources. Automated and supported archives and backups keep your resources focused on business, not on technological infrastructure problems that could cause costly reboots and replication failure.
- Cost. A robust, secure archive and backup solution that is affordable and offers transparent pricing will help you to make better financial decisions related to your data production and protection.
- Compliance. Backups and archives help you to meet SLAs and industry best practices and reassure your customers.
- Protection. An archive and backup system will keep your proprietary data safe, secure, and accessible as needed so you never fall behind because of lost or corrupted data.
Backblaze for Backup and Archiving
In the Backblaze product line, Backblaze Personal Backup and Backblaze Business Backup include unlimited backups for Windows and macOS for a flat fee. Backblaze B2 Cloud Storage is general purpose, pay-as-you-go object cloud storage. It is ideal for archiving, backing up servers, VMs, NAS, Linux, Macs, and PCs, and storing general object data using one of the many integrations available from Backblaze’s partners. See our pricing page for details about Backblaze B2 costs.
The task of backing up your data will only become larger as the amount of data you produce grows each year. For a backup and archiving service that won’t eat up your cash flow with the next data-heavy project—or 10—let Backblaze handle migrating everything to our B2 Cloud Storage for free. Scale up with the right tool and service for your growing storage needs.