Backblaze Vaults: Zettabyte-Scale Cloud Storage Architecture

March 11th, 2015

We are thrilled to share the results of a project that has engaged the Backblaze engineering team for the last year: Backblaze Vaults, a major step forward in our cloud storage service’s technology stack. Currently Backblaze stores over 150 petabytes of data and has recovered over 10 billion files for customers of our cloud backup service. The new storage Vaults will form the core of our cloud services moving forward. Backblaze Vaults are not only incredibly durable, scalable, and performant, but they dramatically improve availability and operability, while still being incredibly cost-efficient at storing data. We shared the design of the original Storage Pod hardware we developed, here we’ll share the architecture and approach of the cloud storage software that makes up a Backblaze Vault.

Backblaze Vault Architecture for Cloud Storage

The Vault design follows the overriding design principle that Backblaze has always followed: “keep it simple.” As with the storage pods themselves, the new Vault storage software relies on tried and true technologies, used in a straightforward way, to build a simple, reliable, and inexpensive system.

A Backblaze Vault is the combination of the Backblaze Vault cloud storage software and the Backblaze Storage Pod hardware.

Putting The Intelligence in the Software

Another design principle for Backblaze is to expect as little as possible from the hardware and abstract the intelligence of the cloud storage into the software. This was the case when we originally designed our Storage Pod hardware and continues as a design goal with Vaults. In addition to leveraging our low-cost Storage Pods, Vaults continue to take advantage of the cost advantage of consumer-grade hard drives, and cleanly handle their common failure modes.

Distributing Data Across 20 Storage Pods

A Backblaze Vault is comprised of 20 Storage Pods, with the data evenly spread across all 20 pods. Each Storage Pod in a given vault has the same number of drives, and the drives are all the same size.

Drives in the same drive position in each of the 20 Storage Pods are grouped together into a storage unit we call a “tome”. Each file is stored in one tome, and is spread out across the tome for reliability and availability.

Backblaze Vault Tome

Every file uploaded to a Vault is broken into pieces before being stored. Each of those pieces is called a “shard”. Parity shards are added to add redundancy, so that a file can be fetched from a vault even if some of the pieces are not available.

Each file is stored as 20 shards: 17 data shards and 3 parity shards. Because those shards are distributed across 20 storage pods in 20 cabinets, the Vault is resilient to the failure of a storage pod, or even a power loss to an entire cabinet.

Files can be written to the Vault when one pod is down, and still have 2 parity shards to protect the data. Even in the extreme and unlikely case where three Storage Pods in a Vault lose power, the files in the vault are still available because they can be reconstructed from the 17 pieces that are available.

Storing Shards

Each of the drives in a Vault has a standard Linux file system, ext4, on it. This is where the shards are stored. There are fancier file systems out there, but we don’t need them for Vaults. All that is needed is a way to write files to disk, and read them back. Ext4 is good at handling power failure on a single drive cleanly, without losing any files. It’s also good at storing lots of files on a single drive, and providing efficient access to them.

Compared to a conventional RAID, we have swapped the layers here by putting the file systems under the replication. Usually, RAID puts the file system on top of the replication, which means that a file system corruption can lose data. With the file system below the replication, a Vault can recover from a file system corruption, because it can lose at most one shard of each file.

Creating Flexible and Optimized Reed-Solomon Erasure Coding

Just like RAID implementations, the Vault software uses Reed-Solomon erasure coding to create the parity shards. But, unlike Linux software RAID, which offers just 1 or 2 parity blocks, our Vault software allows for an arbitrary mix of data and parity. We are currently using 17 data shards plus 3 parity shards, but this could be changed in the future with a simple configuration update.

Vault Row of Storage Pods

For Backblaze Vaults, we threw out the Linux RAID software we had been using and wrote a Reed-Solomon implementation from scratch. It was exciting to be able to use our group theory and matrix algebra from college. We’ll be talking more about this in an upcoming blog post.

The beauty of Reed-Solomon is that we can then re-create the original file from any 17 of the shards. If one of the original data shards is unavailable, it can be re-computed from the other 16 original shards, plus one of the parity shards. Even if three of the original data shards are not available, they can be re-created from the other 17 data and parity shards. Matrix algebra is awesome!

Handling Drive Failures

The reason for distributing the data across multiple Storage Pods and using erasure coding to compute parity is to keep the data safe and available. How are different failures handled?

If a disk drive just up and dies, refusing to read or write any data, the Vault will continue to work. Data can be written to the other 19 drives in the tome, because the policy setting allows files to be written as long as there are 2 parity shards. All of the files that were on the dead drive are still available, and can be read from the other 19 drives in the tome.

Building a Backblaze Vault Storage Pod

When a dead drive is replaced, the Vault software will automatically populate the new drive with the shards that should be there; they can be recomputed from the contents of the other 19 drives.

A Vault can lose up to three drives in the same tome at the same moment without losing any data, and the contents of the drives will be re-created when the drives are replaced.

Handling Data Corruption

Disk drives try hard to correctly return the data stored on them, but once in a while they return the wrong data, or are just unable to read a given sector.

Every shard stored in a Vault has a checksum, so that the software can tell if it has been corrupted. When that happens, the bad shard is recomputed from the other shards, and then re-written to disk. Similarly, if a shard just can’t be read from a drive, it is recomputed and re-written.

Conventional RAID can reconstruct a drive that dies, but does not deal well with corrupted data because it doesn’t checksum the data.

Scaling Horizontally

Each vault is assigned a number. We carefully designed the numbering scheme to allow for a lot of vaults to be deployed, and designed the management software to handle scaling up to that level in the Backblaze data centers.

Each vault is given a 7-digit number that looks like a phone number, such as: 555-1001. The first three digits specify the data center number, and the last four specify the vault number within that data center.

The overall design scales very well because file uploads (and downloads) go straight to a vault, without having to go through a central point that could become a bottleneck.

There is an authority server that assigns incoming files to specific Vaults. Once that assignment has been made, the client then uploads data directly to the Vault. As the data center scales out and adds more Vaults, the capacity to handle incoming traffic keeps going up. This is horizontal scaling at its best.

We could deploy a new data center with 10,000 Vaults, and it could accept uploads fast enough to reach its full capacity of 90 exabytes in just over a month!

Backblaze Vault Benefits

The Backblaze Vault architecture has 6 benefits:

  1. Extremely Durable
  2. The Vault architecture is designed for 99.999999% annual durability. At cloud-scale, you have to assume hard drives die on a regular basis, and we replace about 10 drives every day. We have published a variety of articles sharing our hard drive failure rates.

    The beauty with Vaults is that not only does the software protect against hard drive failures, it also protects against the loss of entire storage pods or even entire racks. A single Vault can have 3 storage pods – a full 135 hard drives – die at the exact same moment without a single byte of data being lost or even becoming unavailable.

  3. Infinitely Scalable
  4. A Backblaze Vault is comprised of 20 storage pods, each with 45 disk drives, for a total of 900 drives. Depending on the size of the hard drive, each vault will hold:

    4TB hard drives => 3.6 petabytes/vault (Deploying today.)
    6TB hard drives => 5.4 petabytes/vault (Currently testing.)
    8TB hard drives => 7.2 petabytes/vault (Small-scale testing.)
    10TB hard drives => 9.0 petabytes/vault (Announced by WD & Seagate.)

    Backblaze Datacenter

    At our current growth rate, Backblaze deploys a little over one Vault each month. As the growth rate increases, the deployment rate will also increase. We can incrementally add more storage by adding more and more Vaults. Without changing a line of code, the current implementation supports deploying 10,000 Vaults per location. That’s 90 exabytes of data in each location. The implementation also supports up to 1,000 locations, which enables storing a total of 90 zettabytes! (Also known as 90,000,000,000,000 GB.)

  5. Always Available
  6. Data backups have always been highly available: if a storage pod was in maintenance, the Backblaze online backup application would contact another storage pod to store data. Previously, however, if a storage pod was unavailable, some restores would pause. For large restores this was not an issue since the software would simply skip the storage pod that was unavailable, prepare the rest of the restore, and come back later. However, for individual file restores and remote access via the Backblaze iPhone and Android apps, it became increasingly important to have all data be highly available at all times.

    The Backblaze Vault architecture enables both data backups and restores to be highly available.

    With the Vault arrangement of 17 data shards plus three parity shards for each file, all of the data is available as long as 17 of the 20 Storage Pods in the Vault are available. This keeps the data available while allowing for normal maintenance, and rare expected failures.

  7. Highly Performant
  8. The original Backblaze storage pods could individually accept 950 Mbps (megabits per second) of data for storage.

    The new Vault pods have more overhead, because they must break each file into pieces, distribute the pieces across the local network to the other storage pods in the vault, and then write them to disk. In spite of this extra overhead, the Vault is able to achieve 1000 Mbps of data arriving at each of the 20 pods.

    Backblaze Vault Networking

    This does require a new type of Storage Pod, and we’ll be sharing the design of the new pod soon. The net of this: a single Vault can accept a whopping 20 Gbps of data.

    Because there is no central bottleneck, adding more Vaults linearly adds more bandwidth.

  9. Operationally Easier
  10. When Backblaze launched in 2008 with a single Storage Pod, many of the operational analyses (e.g. how to balance load) could be done on a simple spreadsheet and manual tasks (e.g. swapping a hard drive) could be done by a single person. As Backblaze grew to nearly 1000 storage pods and over 40,000 hard drives, the systems we developed to streamline and operationalize the cloud storage became more and more advanced. However, because our system relied on Linux RAID, there were certain things we simply could not control.

    With the new Vault software, we have direct access to all of the drives, and can monitor their individual performance, and any indications of upcoming failure. And, when those indications say that maintenance is needed, we can shut down one of the pods in the Vault without interrupting any service.

  11. Astoundingly Cost Efficient
  12. Even with all of these wonderful benefits that Backblaze Vaults provide, if they raised costs significantly, it would be nearly impossible for us to deploy them since we are committed to keeping our online backup service just $5 per month for completely unlimited data. However, the Vault architecture is nearly cost neutral while providing all these benefits.

    Backblaze Vault Cloud Storage

    When we were running on Linux RAID, we used RAID6 over 15 drives: 13 data drives plus 2 parity. That’s 15.4% storage overhead for parity.

    With Backblaze Vaults, we wanted to be able to do maintenance on one pod in a vault and still have it be fully available, both for reading and writing. And we weren’t willing to have fewer than 2 parity shards for every file uploaded, for safety. Using 17 data plus 3 parity drives raises the storage overhead just a little bit, to 17.6%, but still gives us two parity drives even in the infrequent times when one of the pods is in maintenance. In the normal case when all 20 pods in the Vault are running, we have 3 parity drives, which adds even more reliability.

What Does This Mean For Backblaze Cloud Backup Users?

Any Backblaze customer who is using Backblaze Online Backup 3.0 or higher is able to use the Backblaze Vaults. (Read the knowledge base article to check what version you’re running.) This will happen automatically, there is nothing to configure or change. Over time, Backblaze will migrate all customer data from the existing Storage Pod architecture to the Vault Architecture.

Summary

Backblaze’s cloud storage Vaults deliver 99.999999% annual durability, horizontal scalability, and 20 Gbps of per-Vault performance, while being operationally efficient and extremely cost effective. Driven from the same mindset that we brought to the storage market with Backblaze Storage Pods, Backblaze Vaults continue our singular focus of building the most cost-efficient cloud storage around.

[4/5/2016 – Updated annual durability to 99.999999% to reflect current operations – Ed.]

 

Brian Beach

Brian Beach

Brian has been writing software for three decades at HP Labs, Silicon Graphics, Netscape, TiVo, and now Backblaze. His passion is building things that make life better, like the TiVo DVR and Backblaze Online Backup.
  • David Du

    What’s region you have, does it cover Asia?

  • Shahid Rana

    i believe there must be a layer of abstraction for tome , i am wondering which component is writing the shards to each disk in tome and how the that file is retrieved. How come they are avoiding the central bottleneck. For an academic quest i am trying to find how things are done.

  • Jamie Dunbar

    Hi there, can someone explain the realistic terms what a durability of 99.999999 (B2) and 99.999999999 (Nearline) actually means? Surely it is compounded mathematically so more 9s at some point becomes slightly irrelevant?

  • Chris

    Nice Racks, what kind of switches are these?

  • Tharun

    Which RAID is used 2 or 5 or 6?

  • Mark Scott

    I am aware of 2 UK data centres that have one of the lowest PUEs in the industry. Would be good in the future to have people backing up to “nearest” data centre via IP address.

  • Stoatwblr

    Did you just reinvent RAIDZ3 ?

  • Sam Margulis

    Sacramento is more likely to experience an earthquake than a meteor….but I digress…

    How about somewhere with lots of sun and looking into solar energy for your energy requirements? Apple is building or has built a solar farm in Mesa, AZ (outside Phoenix) and is building their datacenter there. Solar should be eventually cheaper than fossil fuel electricity, or even hydro…if it isn’t already close enough.

    • While we’re not quite at the point of adding a second datacenter too far away yet, we’re going to be looking at other locations soonishly! Mesa sounds nice!

  • Randy Green

    Read performance?

  • What has your experience been with disk drives? Which are the most reliable? What measurements indicate proximate failure? What temperatures are best? Google published paper a few years ago. Could you update it with your information?

  • Blackbeard

    The problem I have with Backblaze is that unlimited backup really means “whatever you can download in 30 days backup” since you will delete my files after 30 days of inactivity. Being in Europe with awful peering to the U.S west coast I get 3Mbps download from Backblaze (despite being on 100Mbps) 3Mbps = approximately 30GB per day, 30*30 = 900. So any backup above 900GB (probably less due to overhead) is completely useless for me to store on Backblaze. You will delete my files before I have a chance to recover them.

    Yeah I could order the USB drive from Backblaze. But the 189$ restore cost is not something you really want from a 5$/month service. I suppose I could also invest in a ton of 500GB drives and spread out my data on more physical drives to avoid big restores from Backblaze.

    Any updates on this issue Backblaze? This is the reason I’m not your customer yet, been wanting to for years.

    • Robert Trevellyan

      If you think of cloud backup as primary, it’s never going to be fast enough except for small restores. If you recognize that any cloud backup service is a last resort when both your main storage and your local backup fail, maybe the fact that you could actually get your data back for $189 doesn’t seem so bad.

  • Well, a clear and concise explanation by Mr. Beach. Linux-based RAID 6 using 15-HDD drive stripe sets (3 or 4 such sets per pod) was not ideal for a backup service that has already reach 150PB of stored data. Moving to the use of Reed-Solomon based forward error correction codes or erasure codes creates a simpler and more durable backup service. The selection of 17:3 for the data shard to parity shard ratio “seems” like not enough protection…16:4 “feels” like it would have been a better choice even though it increases the storage overhead. Will be interesting to see how moving to the use of erasure codes changes the CPU and RAM requirements in the new BackBlaze storage pod design.

  • lauraflorezc_ob

    Just imagine how cool it will be when the datacenter could be located even in safer places… like off-planet!!! One in the Moon, one in Mars… :-P

    • Brian Beach

      My guess is that we’d have a hard time hiring data center technicians. The commute would be pretty long. :-)

      Seriously, though, if you know anybody who would love to work at a data center in Sacramento, let us know!

  • PETER GREEN

    What are you doing about IP addresses? does every vault server have a public IPv4 IP? if not how do the clients access them? if so do you see IPv4 address shortages having a significant impact on your costs and/or scalability?

  • Guille -bisho-

    Having disks in the same position form a tome has some safety implications. For example imagine that a flood in the datacenter destroys all the bottom pods. The data will be gone for good.

    Being less drastic, everybody knows disk reliability depends on factors like temperature and vibration, and those depend mostly on the position of the drive in the pod. Some (in the middle/back) will run hotter than others (in the front sides), so some tomes will have different reliability.

    Spreading tome drives in different racks, different position within the rack and different position within the pod is inherently more reliable, at the cost of some management/operational complexity, but nothing that good tooling can’t fix.

  • Hawkwing

    You guys are awesome! Love the triple parity. Love even more that you share so much with everyone.

  • Paul Kennedy

    Are there plans to allow backup from Windows Server 2012?

    • Paul Kennedy

      Did anyone see this and plan a response??

      • Brian Beach

        We have no immediate plans to support Windows Server.

  • Mark

    Will existing backups and Backblaze hardware already installed be migrated to this architecture? Or is this for new pod deployments/backups only?

    • Brian Beach

      As the posting says, just before the summary…

      Any Backblaze customer who is using Backblaze Online Backup 3.0 or higher is able to use the Backblaze Vaults. (Read the knowledge base article to check what version you’re running.) This will happen automatically, there is nothing to configure or change. Over time, Backblaze will migrate all customer data from the existing Storage Pod architecture to the Vault Architecture.

  • Carlos Herrera

    This architecture is very similar to http://en.wikipedia.org/wiki/Cleversafe

    Are the following the main differences in the architectures?
    – Cleversafe is an Object Storage System with HTTP proprietary and S3-Compliant APIs as opposed to the Backblaze File System. Actually, does Backblaze publish any APIs that allow custom clients to store/retrieve files?
    – Any Backblaze Storage Pod can slice the incoming file and disperse it to other pods taking advantage of the drive position to know where to write it to instead of maintaining a look up table of shard locations.

    – Any Backblaze Storage Pod can reassemble an outgoing file again taking advantage of the drive position to know where to read the data from instead of maintaining a look up table of shard locations.

  • adamsb6

    I think it would be smart to reconsider having each drive in a tome occupy the same slot in a storage pod. Heat distribution within a pod is probably pretty consistent, and it’s possible you might end up with a batch of drives that aren’t tolerant to heat. If a small heat difference leads to a big spike in drive failure, you might end up losing tomes with this kind of distribution, where a random distribution would be more tolerant of that kind of failure.

  • Ryan Harvey

    Brian, has a study been done based upon n drive failure location with respect to the location in the rack (height). Looks like you guys are using CRACS with a raised floor and air can have a hard time getting to those top racks. Just curious.

    • Brian Beach

      We haven’t looked at that yet, but we have looked at correlations with temperature. Our warmest drives are well within their operating range, and don’t fail any more than the cooler drives.

  • Louwrentius

    Reed-Solomon erasure coding is what Ceph has implemented. Any relation to that?

    • Brian Beach

      Our implementation has no relation to Ceph. My understanding is that Ceph supports multiple erasure coding plugins. For Backblaze Vaults, Reed-Solomon is good enough, so we stuck with that and didn’t complicate things.

      • Louwrentius

        Ok, thanks! Very cool!

  • Belvedere

    Hey, Backblaze! When you make a blog post, I really appreciate that you are right there to answer questions in the comments section. That’s pretty cool – and, thanks :)

    • You’re welcome!

  • Are the parity drives dedicated and fixed? Or are they spread around, like in Linxu’s md raid5/raid6?

    • Brian Beach

      We decided to spread the parity around. Each drive has some data and some parity.

  • Tristan Rhodes

    Thanks for sharing this, I really enjoy reading it. Would you consider sharing your network architecture in a future post? Also, are you involved in anyway with the Open Compute Project?

    • > sharing your network architecture

      It’s a great idea for a blog post! We surely could have used some advice early on. In a nutshell, we have 40 Gbit/sec flowing into the datacenter (backups), and about 1 Gbit/sec flowing out mostly in restores (this week’s numbers, always growing). We use a few different service providers like Cogent in a redundant fashion so if Cogent disappears we still have service. The big fast switches are redundant so we can lose one and not lose traffic. Then it fans out to slower switches in little “rings” where you can actually lose 1 out of 8 switches and the packets still will route mostly.

  • kar

    Perhaps I missed this in the article, but are the 20 pods making up a vault physically co-incident, or distributed at random in the data center?

    • In our implementation they are all in nearby racks, but they can certainly be randomized.

  • Dahc Renrut

    Any chance with this fancy new set up we can start backing up NAS drives too? Come on!!! I have all of my music on a NAS drive to send to my SONOS speakers and would love to be able to back that drive up as well.

    • Sorry Dahc, not yet – but we’re hoping to come up with a solution soon!

      • gavingreenwalt

        How about Cloud Storage as well?

        One feature I would really like to see added is (and let me explain) CloudStorage-Backup. I want a backblaze backup of my cloud storage. For instance I have a OneDrive account with data spread out across multiple machines… but not necessarily all on any given machine backed up by backblaze. Some of my data resides exclusively on OneDrive. If however I could have a cloud backup solution that logs into OneDrive and uses the OneDrive SDK to download and backup all of my data then I would know that I have redundancy on all of my OneDrive storage as well and I wouldn’t need to use an intermediary host that is set to download 100% of my OneDrive storage for Backblaze to see it.

        • Interesting. Though I think that situation only arises when you have files exclusively in OneDrive. If you are really diligent, you could have a copy of the data on your computer, one copy in OneDrive to get to easily, and then Backblaze would be backing up the copy that’s on your computer!

  • Martin Jones

    How do you know which drives are bad?

    • Hi Marin! We have a lot of processes continuously running throughout our farm looking for drive errors. We wrote about some of the things we look for here -> https://www.backblaze.com/blog/hard-drive-smart-stats/ !

      • Martin Jones

        How do you then find the drive? I assume your system can tell you exactly which drive in which machine? You must have one crazy complicated inventory system!

        • Yes to both :-p

          • Martin Jones

            I’ve been a storage nerd for twenty years, and this is the coolest stuff since PMR.

      • There are three levels of answer to this question:

        1) We attempt to write data to the drive and it simply fails. Meaning the OS realizes the drive is completely, utterly, dead or even missing. No data can be read or written – time to replace that drive.

        2) Some subtle errors are begging to appear in the SMART drive stats (see Yev’s answer and this blog post https://www.backblaze.com/blog/hard-drive-smart-stats/).

        3) We pass over every single file on every single drive on a slow basis and read every last shard off of disk, recompute it’s checksum, and look for missing shards. Drives sometimes get hit by a cosmic ray and flip a bit. So when this occurs, we rebuild the shard from the other pods in the vault. But along the way if a drive exhibits too many errors, something is goofy and we might swap out the drive. This is different than #2 – it may not be throwing SMART errors and we STILL will replace a suspicious drive (or maybe a SATA cable, or a network card, etc).

        • Adela

          I am wondering how many of the drives that you replace are physically failed? That would be category 1) that you mention. And for the case of predicting the drive’s failure based on SMART, do you rely also on the build-in prediction or does your in-house prediction catch drives before the SMART would mark the drive as in danger of failing?

          In the data that you’ve made available about the drives in your datacenter, is there any way to tell which is the reason that a drive was replaced? That would be very interesting to see!

          • I just wandered by the desks of the guys in charge of drive replacement decisions, and they say out of the 10 drives a day that are replaced, about half are completely dead and the other half are “proactive” replacements. For the proactive replacements, it’s based on errors in kernel logs or certain high SMART numbers, all based on some rules of thumb our drive replacement guys kind of agree on.

            Also, pretty often in our world there are patterns that appear we’ve never seen before. For example, after two years of relative stability after we bought a bunch of these Seagate 3 TB drives – after two years they suddenly start failing at a much higher rate than other drives. We hold a strategy meeting, and possibly decide to proactively replace every single Seagate 3 TB in the datacenter – even drives that haven’t thrown a single error yet. Or maybe we decide to get super sensitive about the tiniest SMART statistics but ONLY on the Seagate 3 TBs. Stuff like that.

          • Paul Kennedy

            Do you ever redistribute semi-failed hardware. I know I’d be interested in obtaining ‘partial-failed’ drives etc…

          • Paul Kennedy

            Any response?

          • Adela

            Hi Brian, thanks for the information! I have two questions –
            * In the dataset that you’ve made publicly available, is there a mix of preemptive failures and physical failures?
            * If this is the case, can we distinguish between failures and preemptive replacements? Could you provide us with a way of identifying the two categories? We are interested only in actual failures, and we’d like to have a way to filter the preemptive replacements out.
            Thanks!

        • John Williams

          Most HDDs spec sheets list a URE (unrecoverable read error rate) of better than 1 in 10^14 bits or 1 in 10^15 bits. I’ve always wondered how accurate these numbers are.

          Is backblaze able to compute an overall URE rate for their drives?

  • Bltserv

    I would suggest looking at some Disk Drive handling classes for your lab. Stacking drives like cordwood is a huge “no-no”. Handling a modern drive properly is very important. The shock of drives touching a hard surface during handling can create premature failures. Padding between drives and hard surfaces is essential. And if that stack pictured slips and drives fall over. Figure about a 10% early failure rate. I would get that photo pulled before an OEM like Seagate voids your warranties.

    • Definitely! If I were to guess that pod has come out of production and the drives are on their way to be eradicated. They’re certainly protected before going in to production, but definitely good information for folks reading!

  • geoah

    Is the v4 or v4.5 schematics public for users to download, cut, fold and build their own pods?
    First 3 versions were open/public iirc but can’t find anything for v4 and up. — Or am I wrong here?

  • Tom

    Any plans on building a general cloud storage service like dropbox or google drive?

    • Nothing for now, but I suppose it’s possible, if we could figure it all out and there’s a good enough reason to put resources in it!

  • Ian Worthington

    Is this going to give any relief to people who can’t use the full bandwidth of their connections? Over a month ago I posted to https://www.backblaze.com/blog/broadband-getting-broader/ the following comment and am still waiting for an update from BB:

    @Hectic: This is a known issue for Backblaze. I am currently in northern South America and can only achieve 50% usage of my 2Mbps upstream. Using a VPN was not suggested to me: instead I was told that as the upload protocol neither windows nor runs parallel streams the tx rate degrades with ping speed. Apparently it’s on the “to do” list but is not considered high priority.

    I’m wondering now if this is the full story. My tx rate did increase when I upgraded from 1Mbps to 2Mbps, and if Dee is getting 10Mbps over a 100Mbps line then does this reason make sense?

    Anyone from BB care to comment?

    • Yup! We’re working on something that we hope will help out with high-latency connections. Stay tuned!

    • loxposax

      Speeds from Europe to Backblaze are really poor.

      • James Price

        with EU data protection laws is it wise to be using a US based backup service?

        • loxposax

          Why does it matter?

          My data is encrypted locally before it is uploaded.

      • Ioxposax – contact Backblaze support and tell them about your issue. Maybe you can help “beta test” a new feature that I hope alleviates your issue. Go to https://www.backblaze.com/help.html and do “Live Chat” or “Submit Request”.

    • Ian – contact Backblaze support and tell them about your issue. Maybe you can help “beta test” a new feature that I hope alleviates your issue. Go to https://www.backblaze.com/help.html and do “Live Chat” or “Submit Request”.

  • Olivier Chédru

    Is data replicated/backed up to another data center?

    • Right now, Backblaze has only one datacenter, so the short answer is “no”. :-)

      The longer answer is that for online backup, there is one copy of your data on your laptop, and another copy in the Backblaze datacenter in Sacramento. If a meteor hits our datacenter in Sacramento pulverizing it into atoms, you STILL would not lose one single file, not one – because your laptop is still running just fine where ever you are with your copy of the data. In the case that occurs, we will alert our users they should make another backup of their data.

      • Shaun Moon

        It would be tough to alert anyone after being hit by a meteor! Now THAT’S dedication!

        • Depends on the size of the meteor…luckily our office is pretty farm from the datacenter, so there’s a chance!

          • A pretty cubicle farm I guess ?

          • Sigh…far :-p

          • dakishimesan

            I was going to say, your office is on a farm? Cool! And not just any farm, a pretty farm!

            Can I just say: in my technology designs for IT work and my personal computing needs, as well as in my personal life, I am a minimalist and value simplicity, and I continued to be very impressed with the beauty and simplicity of the storage pods you have made, and now understanding your software, that too. I love, love that your hard drives are just storing basic ext4 files with the parity software working independently, and that each shard is checksum individually, it’s extremely elegant.

            I’ve considered building a storage pod personal use just because I admire the architecture so much. I am curious: do you have an area and your data Center where you experiment with variations of the pod design? For example, pods with more vibration dampening to see if that extends hard drive life, pods with different cooling arrangements for the hard drives, pods with SSD catching, etc.?

          • Short answer to that is definitely, we’re constantly tweaking our designs (the 1.0, 2.0, 3.0, 4.0, and 4.5 posts come out once we reach a happy plateau, but then start iterating again).

        • Recently I’ve been intrigued by the concept of a “Dead Man’s Switch”. http://en.wikipedia.org/wiki/Dead_man%27s_switch

          Backblaze could have an email go out from a remote location automatically to all of our users *IF* we don’t stop the message from being sent once per hour. That way after the meteor hits and kills us all this remote location sends one last communication to our customers saying, “Thank you for being a customer, but Backblaze seems to have vaporized.” :-)

          • mAurelius

            I’m suddenly having flashbacks to Lost and pushing the button every 108 minutes in the Swan station.

      • Omid A.

        That’s optimistic. The more users you have the more likely it is that right after a meteor incident, at least one of your users’ laptops crashes too.

        Replication factor of 2 is never enough to secure data. Once a copy fails you are on the verge of data loss.

        • It is true.

          What I recommend to my very closest family members and trusted friends is that for data you feel would be catastrophic to lose, I recommend you have at least three copies including the primary copy. That is two separate backups with TWO SEPARATE VENDORS who did not share any code, hopefully managed by two separate UIs. For bonus points, one backup should be “offsite”. For example, many Backblaze customers use Time Machine on the Macintosh for a local backup, and Backblaze for their remote backup, and that is what EVERYBODY should be doing. I can show you many support cases where Time Machine failed to restore a file and Backblaze saved the day, and vice versa. The fact is that users make mistakes, UIs are hard to use, your 14 year old son decided to unplug the Time Machine USB hard drive to free up a USB port (which disables that backup but Backblaze saves the day), or your 14 year old daughter decided to uninstall the Backblaze agent – in other words, stuff happens!!

          No matter how many copies Backblaze has in how many datacenters, it is simply better for you to have two backups with two separate vendors (as long as they also aren’t in our same datacenter when the meteor hits).

          • Three copies, at least two locations. Say it loud and proud.

          • dakishimesan

            A quick suggestion in this regard. I divide my personal files into a mission-critical set and a regular set, And while the entire pool is backed up using Backblaze, the critical set is also synced to Microsoft one Drive. Wiley sink is not exactly the same as a back up, those critical files in addition to being being stored in a different data center are also Synced within minutes of saving them. One Drive does not allow a private encryption key, but on Mac OS X I use sparse bundles that are encrypted.

          • Belvedere

            Does Backblaze plan to eventually have multiple data centers in different places? Or will it always be just the one?

          • I doubt we will open a second datacenter in 2015, but soon after yes. The most obvious reason is we will run out of space in our current location – to continue to accept customers we will need to expand SOMEWHERE after we fill the current datacenter to the roof.

            A much more interesting question is “where”? Our electrical power bill is one of our largest ongoing costs, so the new datacenter might be in a location with the cheapest electrical rates. For example, up along the Oregon/Washington border where Google and Microsoft also put huge datacenters (probably for the inexpensive electrical power generated by hydro-electric).

          • Belvedere

            I think it’s very cool that you are factoring renewable energy sources into your decision making!

          • Sebastien

            Canada could be interesting for you guys. Particularly Quebec where electricity is very affordable and mostly hydro-electric and cold weather for a good part of the year helps in lowering costs. There are a few established data centers in Montreal, including OVC.

          • elementxero

            Well when you do we all expect an equally in-depth explanation of your WAN replication solution ;)

          • Columbus, Ohio is a good place for a high tech company looking for machine room space.

          • This comment is kind of random, but I’m a Canadian living in the Bay and would be remiss not to mention great Canadian electrical prices.

            The provinces of Manitoba and Quebec both have _major_ hydro-electric stations that seem to rival the prices of Washington and Oregon. Plus Manitoba has a ridiculously low ambient temperature for most of the year, the cooling practically does itself :)

          • Thanks Gaëtan! Canada is really good to consider for the reasons you mention, plus having servers in two countries is very interesting for a variety of great reasons. (Some customers are uncomfortable with even heavily encrypted data living in the USA.) And it is “less far away” to visit than a datacenter in Europe.

          • Generic42

            You should definitely look at the Cheyenne Wyoming area, other major DCs have settled on this area due to low electric costs, low cooling costs and a stable environment.

          • Alexander Wood

            God no, that’s all dirty coal-fired power from Power River Basin. California’s power is less dirty by far, even if it is more expensive.

          • Generic42

            If you discount the giant wind-farm that is 10 miles away, yes. But if they want clean energy, it’s available.

          • Alexander Wood

            Coal is the reason it’s cheap, and is what supplies most of the power.

          • Kristoffer Fagerlund

            Try Luleå, Sweden, for cheap, reliable electricity and cooling ;)

          • I just looked up Luleå, Sweden and Facebook has put a datacenter there! It’s definitely on the (future) candidate list now. I’ve also heard of datacenters up in Finland and Norway, we’ll need to look into all of them. I’m officially volunteering to do a scouting trip for Backblaze. :-) I have not yet been to any of these countries, and I would really like to visit.

          • Kristoffer Fagerlund

            FB has two data centers there now, one operative and another one beeing deployed.
            Brittish Hydro66 has also built a data center in the region (Boden) .

            A new research center is also being built there now, targeting cloud storage and very large datasets.

            https://translate.google.com/translate?u=http://www.nyteknik.se/nyheter/it_telekom/allmant/article3917528.ece&langpair=sv%7Cen&ie=UTF8

          • Kristoffer Fagerlund

            It seems that I’m not fully updated, there are too many data centers being built here now. Totally five centers in the region

            From the article (via google translate):

            “In recent years, several large server rooms built in northern Sweden. Best known is Facebook’s two big halls in Luleå. In Constance are KNC Miner with two halls and a third in the works. It also Hydro66.

            Earlier this week announced Fortlax to establish a new 1 MW and 1,000 m² data center in Piteå.”

          • C_W_

            My hometown! Pryor, Oklahoma I’ve been trying to tell people this for 18 years! (and then googgle goes and builds a datacenter there… no one ever listens to the geeks in high school…lol)

          • Come to Quebec City. Cheapest electricity (hydro) and average annual temperature is 40F, so easy to keep cool ;)

            And the mayor here LOVES datacenters. One that just opened here : http://www.4degrees-colo.com/

          • Valča

            Latvia! Some of the EU best datacenters are located here. Very ecologic energy, and europes fastest internet:) And our country has pretty cold weather (except like 1mth a year). There are lots of military bunkers, great for datacenters. One is situated in 9m underground. They say a nuke could strike and it will survive.

          • elvis spada

            Ever thought of alternative places to put a datacenter – like in Albania? It would be interesting to evaluate this option and build a datacenter not far from hydropower plants where energy cost is very cheap and also land too!
            As far as I know there are incentive policies which depending on the investment, you can buy land for 1 (one) Eur per square meter!

      • Snowman2000

        Are there plans to add a datacenter at a geographically dispersed location? I was just reading about the increased chances of the “big one” quake in CA. Since I’m also in Northern California, it would be nice to know if a quake affected the Bay area and Sacramento there was another copy somewhere.

        • Brian from Backblaze here – when we went through the search for the current datacenter, I personally was surprised to find out all of California is not at increased risk for earthquakes, and we located our current datacenter in a place that simply isn’t affected by quakes any more than any other place on earth. Here is the way we chose the current location: https://www.backblaze.com/blog/our-secret-data-center/ which includes the earthquake maps for California

          ALSO -> we made sure it is *NOT* in a flood plain (much of Sacramento can be submerged under water once in a while), plus we wanted to avoid Tornados and Hurricanes. I’m dead serious, you may as well just never ever EVER have to deal with Hurricanes.

          • dakishimesan

            It’s nice to know that so much thought went into the physical location of the data center.

            Even if there was a second data center, I’m not sure if it would be economically feasible to make parity duplications between them at the current price point.

          • Falcon89

            Couldn’t there be 2+ data centers and your data gets assigned to one of them for $5/month and the option to have the data at 2 for $7.50-$10/month

          • Fabian Franz

            I agree, $10 for redundant storage in several data centers would be a great deal. Obviously the routing would be a little more complicated, but if one data center is pure backup – just in case, then in normal operation all that is needed is to distribute the upload to the backup center, too.

      • marcatdisqus

        As a former sysadmin I can see a great reason for multiple locations. Backup is all about redundancy. I used to have my original copy on the server. Then backup to another server in the data center. Also backup to tape which then goes offsite. Then also backup to another geographically diverse datacenter.

        This might be overkill for somebody’s photos or documents of course. I use Backblaze and love it though. Have it on two of our home computers. I use the logic that I have at least one off-site backup which should remain safe if I have a catastrophic failure here. Being a bit more paranoid though, I also do backup to another drive locally using Acronis. Furthermore, I backup very important files and all photos to Google Drive (and Picasa).

        The argument you make is that if Backblaze gets hit by a meteor or nuke we still have our local copy. That is probably true but I have to wonder how many 9’s of reassurance? I’ve seen some crazy coincidental failures in the data centers where losses occurred in under 24 hours. I always figured “what are the odds???”. It does happen though.

        I suggest it all depends on just how important files are for you. If they are priceless like maybe family photos or whatever you might add one extra level of backup to your local and backblaze copies. If Backblaze had a 2nd and possibly 3rd data center with very diverse geography one could probably do without the extras although I’m paranoid enough I probably would still have a few copies laying around somewhere :-)

        No matter what Backblaze is a phenomenal service and a great value at what they charge. I’d always use it as one of my tentpole strategies of backup. The crazy thing is MOST people still have NO backup at all. Not even a local hard drive backup. These are people that always learn the hard way after it is too late. I’ve met some of these people who came to me crying when their drive failed and wanted me to save everything. I always say “didn’t you back everything up???”

        • dakishimesan

          I have some great recovery software for clients in just the situations, but yes you’re correct that well we tend to worry about multiple data centers, most people don’t even back to external.

        • Absolutely! I don’t think anyone is arguing for only having one datacenter. In Backblaze’s case, we were a bootstrapped company. Any additional datacenters would at least double our costs, so we have to make tradeoffs. With the Vault architecture the “random datacenter weirdness” quotient gets really low, and we’re pretty happy with that! Of course having more datacenters would be better, and it’s possible that we will get there in the future, but like all things Backblaze, we have to make sure we get there wisely :)

          • marcatdisqus

            I agree you guys have done it the right way. When you are bootstrapping you don’t want to shred the P&L and incur a 2nd datacenter cost. You guys will get there eventually I am sure.

            When you do get there the trick will be how to shard the data between centers for similar redundancy on a per data center basis while also not doubling costs.

            Amazon S3 is an interesting case study. They put data in 3 geolocated centers and offer eleven 9’s redundancy. At around 3 cents/GB that is $3 per terabyte. Not sure what your average user stores but my guess is around a few 100GB max average. Only thing I don’t know is if amazon makes money at these prices after paying all the overhead. Amazon not exactly known for profitability….

            Anyway, I LOVE the pods you guys build and the technology. As an end user on both PC and Mac so easy to install and use. Set it and forget it peace of mind. Keep up the good work.

          • palesz

            3 cents/GB means $30 / TB (and not $3)

      • department_g33k

        I live within a meteor’s impact-radius of Sacramento, so in your example, I’d lose data. Of course, I’d also be DEAD, so data lost probably isn’t as big of a deal.

      • Nicholas Reichart

        Build a secondary location in the Midwest, all the big companies are doing it ;) (mostly in Iowa)

        • +1 Iowa. See @brianwski:disqus.

      • mAurelius

        While I am also a big advocate of the 3-2-1 backup strategy, I just have to make one slightly off-topic comment: whenever there are discussions about meteors or nukes, I often ask myself: “how much will I really care about my data if there is an event catastrophic enough to take out entire datacenters?”

        • Belvedere

          Well if it’s just catastrophic enough to take out a data center, you will probably still care pretty much about your data ;)

        • Marius

          What about an EMP strike? Your HDDs are toast, Backblazes maybe as well, but you might be very well alive.

      • Greg Caulder

        Well that might not be the case if you live extremly close to Sacramento, like I do.

  • John Smith

    Why not leverage ZFS for it’s built-in checksumming abilities? Was it considered, but found to not be worthwhile? If so, why?

    • geoah

      ZFS requires 1Gb or ram per 1Tb space — or something like this. :P

      • Tom

        only when you have deduplication turned on, which wouldn’t make sense here anyway.

        • dakishimesan

          As far as I understand they do use a form of deduplication, but it’s in their primary software stack not in the filesystem.

    • Brian Wilson from Backblaze here (not the author Brian Beach) – The Vaults could use ANY underlying single machine file system, we basically need a very simple way to store individual files and read them back later. We do all our own checksumming at a higher level for our own logic purposes. ZFS wouldn’t harm us, but it wouldn’t help at all over what the default Ext4 provides for our purposes.

      • I was just about to write that…you’re so fast! -> Also, we were already using EXT4 so it was an easier implementation, plus we know it performs great in our environment. .

      • Tom

        I think ZFS even on a single disk has checksums for blocks which allows it to correct single bit read errors by itself. That would add a little extra parity I suppose.

        Not that this would matter much, just pointing out that it would in fact help a little.
        Edit: Obviusly also needing slightly more space.

        • Robert Trevellyan

          ZFS can detect an error on any single disk, but requires at least 2 disks to correct an error.

      • John Smith

        So what’s effectively happening is that you’re doing the checksumming at the file level… where as ZFS does checksumming at the file system level? Interesting approach, but good to hear that checksums are being used. Not many folks out there even know or care about them.

        • We also do checksumming and have logic across 20 machines (pods), plus we do load balancing across our 150 Petabyte datafarm which as much as we would like to use some off the shelf software we just never found anything that was inexpensive that worked in our particular case. In the end, I think introducing a more complex file system like ZFS is probably counter productive to our needs. Alternatively to ZFS, we could store each file in an SQL database – it would work, but it wouldn’t make it more simple or less expensive so why? ZFS is really, really cool – but if you aren’t going to use or depend on any of the cool features, it might not be the best choice.

          • Chris M.

            I just want to point out that the level of transparency and openness typified by this thread is why I use and constantly recommend you guys. I’m the “trusted tech guy” to many, and so insight like this helps ensure I’m always makeing good recommendations.

    • Asteroza

      In this case, a ZFS SAN frontend connecting to exported individual disks in a pod via linux iSCSI targets might work. ZFS frontend server has ZFS checksumming on the block level on the front end, RAIDZ3 gets triple parity, a tome = ZFS vdev if the RAIDZ3 vdev member disks are the 20 iSCSI exported disks. The added benefits are ZFS does live inline block repair from block parity, plus LZ4 block compression to reduce SAN network traffic and ultimately the bandwidth to individual disks. L2ARC read caching isn’t necessary for backblaze’s intended usage, but ZIL writezilla type devices may become critical to accelerate writes. As far as architecting individual files to a specific tome, versus increasing the number of tomes that a file is spread out on, that depends on other factors, but ZFS would favor splitting the files across multiple tomes for read acceleration via increased spindle count (zpool with multiple vdev’s), but there’s nothing stopping you from doing single vdev zpools. The file sharding capability is functionally built into the zpool/vdev construct, such that a single zpool with a singe RAIDZ3 vdev as a ZFS filesystem containing ZFS files is equivalent to a single tome with a sharded file. With ZFS, it generally likes D^2+P vdevs, so would favor a (2^4)D+(3P) = 16+3 = 19 disk vdev, similar to the 20 disk tome layout but slightly less space efficient (% of raw storage devoted to parity), but can also scale higher for a vdev as ZFS limits itself at 3 parity disks for a RAIDZ3 while still increasing data disks.