The Cloud’s Software: A Look Inside Backblaze

March 10th, 2017

When most of us think about “the cloud,” we have an abstract idea that it’s computers in a data center somewhere – racks of blinking lights and lots of loud fans. There’s truth to that. Have a look inside our datacenter to get an idea. But besides the impressive hardware – and the skilled techs needed to keep it running – there’s software involved. Let’s take a look at a few of the software tools that keep our operation working.

Our data center is populated with Storage Pods, the servers that hold the data you entrust to us if you’re a Backblaze customer or you use B2 Cloud Storage. Inside each Storage Pod are dozens of 3.5-inch spinning hard disk drives – the same kind you’ll find inside a desktop PC. Storage Pods are mounted on racks inside the data center. Those Storage Pods work together in Vaults.

Vault Software

The Vault software that keeps those Storage Pods humming is one the backbones of our operation. It’s what makes it possible for us to scale our services to meet your needs and with durability, scalability and fast performance.

The Vault software distributes data across 20 different Storage Pods, with the data spread evenly across all 20 pods. Drives in the same position inside each Storage Pod are grouped together in software in what we call a “tome.” When a file gets uploaded to Backblaze, it’s split into pieces we call “shards” and distributed across all 20 drives.

Each file is stored as 20 shards: 17 data shards and three parity shards. As the name implies, the data shards comprise the information in the files you upload to Backblaze. Parity shards add redundancy so that a file can be completely restored from a Vault even if some of the pieces are not available.

Because those shards are distributed across 20 Storage Pods in 20 cabinets, a Storage Pod can go down and the Vault will still operate unimpeded. An entire cabinet can lose power and the Vault will still work fine.

Files can be written to the Vault even if a Storage Pod is down with two parity shards to protect the data. Even in the extreme — and unlikely — case where three Storage Pods in a Vault are offline, the files in the vault are still available because they can be reconstructed from the 17 available pieces.

Reed-Solomon Erasure Coding

Erasure coding makes it possible to rebuild a data file even if parts of the original are lost. Having effective erasure coding is vital in a distributed environment like a Backblaze Vault. It helps us keep your data safe even when the hardware that the data is stored on needs to be serviced.

We use Reed-Solomon erasure encoding. It’s a proven technique used in Linux RAID systems, by Microsoft in its Azure cloud storage, and by Facebook too. The Backblaze Vault Architecture is capable of delivering 99.99999% annual durability thanks in part to our Reed-Solomon erasure coding implementation.

Here’s our own Brian Beach with an explanation of how Reed-Solomon encoding works:

We threw out the Linux RAID software we had been using prior to the implementation of the Vaults and wrote our own Reed-Solomon implementation from scratch. We’re very proud of it. So much so that we’ve released it as open source that you can use in your own projects, if you wish.

We developed our Reed-Solomon implementation as a Java library. Why? When we first started this project, we assumed that we would need to write it in C to make it run as fast as we needed. It turns out that modern Java virtual machines working on our servers are great, and just-in-time compilers produces code that runs pretty quick.

All the work we’ve done to build a reliable, scalable, affordable solution for storing data in a “cloud” led to the creation of B2 Cloud Storage. B2 lets you store your data in the cloud for a fraction of what you’d spend elsewhere – 1/4 the price of Amazon S3, for example.

Using Our Storage

Having over 300 Petabytes of data storage available isn’t very useful unless we can store data and reliably restore it too. We offer two ways to store data with Backblaze: via a client application or via direct access. Our client application, Backblaze Computer Backup, is installed on your Mac or Windows system and basically does everything related to automatically backing up your computer. We locate the files that are new or changed and back them up. We manage versions, deduplicate files, and more. The Backblaze app does all the work behind the scenes.

The other way to use our storage is via direct access. You can use a Web GUI, a Command Line Interface (CLI) or an Application Programming Interface (API). With any of these methods, you are in charge of what gets stored in the Backblaze cloud. This is what Backblaze B2 is all about. You can log into B2 and use the Web GUI to drag and drop files that are stored in the Backblaze cloud. You decide what gets added and deleted, and how many versions of a file you want to keep. Think of B2 as your very own bucket in the cloud where you can store your files.

We also have mobile apps for iOS and Android devices to help you view and share any backed up files you have on the go. You can download them, play back or view media files, and share them as you need.

We focused on creating a native, integrated experience for you when you use our software. We didn’t take a shortcut to create a Java app for the desktop. On the Mac our app is built using Xcode and on the PC it was built using C. The app is designed for lightweight, unobtrusive performance. If you do need to adjust its performance, we give you that ability. You have control over throttling the backup rate. You can even adjust the number of CPU threads dedicated to Backblaze, if you choose.

When we first released the software almost a decade ago we had no idea that we’d iterate it more than 1,000 times. That’s the threshold we reached late last year, however! We released version 4.3.0 in December. We’re still plugging away at it and have plans for the future, too.

Our Philosophy: Keep It Simple

“Keep It Simple” is the philosophy that underlies all of the technology that powers our hardware. It makes it possible for you to affordably, reliably back up your computers and store data in the cloud.

We’re not interested in creating elaborate, difficult-to-implement solutions or pricing schemes that confuse and confound you. Our backup service is unlimited and unthrottled for one low price. We offer cloud storage for 1/4th the competition. And we make it easy to access with desktop, mobile and web interfaces, command line tools and APIs.

Hopefully we’ve shed some light on the software that lets our cloud services operate. Have questions? Join the discussion and let us know.

Peter Cohen
Peter will never give you up, never let you down, never run around or desert you. He also manages the Backblaze blog.

Follow Peter on:
His web site: | Twitter: @flargh | LinkedIn: Peter Cohen | Google+: Peter Cohen
  • Pingback: Michael Tsai - Blog - A Look Inside Backblaze()

  • Billy

    I’m a long time Backblaze customer (5+ years). Is my old data stored with this new method (vaults), or is it sitting on legacy servers that don’t have this redundancy?

  • Koshy George

    Did not want to go the whole way and commit to KISS, stopped at KIS