Yes, Backblaze Just Ordered 100 Petabytes of Hard Drives

By | October 5th, 2017

10 Petabyt vault, 100 Petabytes ordered, 400 Petabytes stored

Backblaze just ordered 100 petabytes’ worth of hard drives, and yes, we’ll use nearly all of them in Q4. In fact, we’ll begin the process of sourcing the Q1 hard drive order in the next few weeks.

What are we doing with all those hard drives? Let’s take a look.

Our First 10 Petabyte Backblaze Vault

Ken clicked the submit button and 10 Petabytes of Backblaze Cloud Storage came online ready to accept customer data. Ken (aka the Pod Whisperer), is one of our Datacenter Operations Managers at Backblaze, and with that one click he activated Backblaze Vault 1093, which was built with 1,200 Seagate 10 TB drives (model: ST10000NM0086). After formatting and configuration of the disks, there is 10.12 Petabytes of free space remaining for customer data. Back in 2011, when Ken started at Backblaze, he was amazed that we had amassed as much as 10 Petabytes of data storage.

The Seagate 10 TB drives we deployed in vault 1093 are helium-filled drives. We had previously deployed 45 HGST 8 TB helium-filled drives where we learned one of the benefits of using helium drives — they consume less power than traditional air-filled drives. Here’s a quick comparison of the power consumption of several high-density drive models we deploy:

MFR Model Fill Size Idle (1) Operating (2)
Seagate ST8000DM002 Air 8 TB 7.2 watts 9.0 watts
Seagate ST8000NM0055 Air 8 TB 7.6 watts 8.6 watts
HGST HUH728080ALE600 Helium 8 TB 5.1 watts 7.4 watts
Seagate ST10000NM0086 Helium 10 TB 4.8 watts 8.6 watts
(1) Idle: Average Idle in watts as reported by the manufacturer.
(2) Operating: The maximum operational consumption in watts as reported by the manufacturer — typically for read operations.

I’d like 100 Petabytes of Hard Drives To Go, Please

“100 Petabytes should get us through Q4.” — Tim Nufire, Chief Cloud Officer, Backblaze

The 1,200 Seagate 10 TB drives are just the beginning. The next Backblaze Vault will be configured with 12 TB drives which will give us 12.2 petabytes of storage in one vault. We are currently building and adding two to three Backblaze Vaults a month to our cloud storage system, so we are going to need more drives. When we did all of our “drive math,” we decided to place an order for 100 petabytes of hard drives comprised of 10 and 12 TB models. Gleb, our CEO and occasional blogger, exhaled mightily as he signed the biggest purchase order in company history. Wait until he sees the one for Q1.

Enough drives for a 10 petabyte vault

400 Petabytes of Cloud Storage

When we added Backblaze Vault 1093, we crossed over 400 Petabytes of total available storage. For those of you keeping score at home, we reached 350 Petabytes about 3 months ago as you can see in the chart below.

Petabytes of data stored by Backblaze

Backblaze Vault Primer

All of the storage capacity we’ve added in the last two years has been on our Backblaze Vault architecture, with vault 1093 being the 60th one we have placed into service. Each Backblaze Vault is comprised of 20 Backblaze Storage Pods logically grouped together into one storage system. Today, each Storage Pod contains sixty 3 ½” hard drives, giving each vault 1,200 drives. Early vaults were built on Storage Pods with 45 hard drives, for a total of 900 drives in a vault.

A Backblaze Vault accepts data directly from an authenticated user. Each data blob (object, file, group of files) is divided into 20 shards (17 data shards and 3 parity shards) using our erasure coding library. Each of the 20 shards is stored on a different Storage Pod in the vault. At any given time, several vaults stand ready to receive data storage requests.

Drive Stats for the New Drives

In our Q3 2017 Drive Stats report, due out in late October, we’ll start reporting on the 10 TB drives we are adding. It looks like the 12 TB drives will come online in Q4. We’ll also get a better look at the 8 TB consumer and enterprise drives we’ve been following. Stay tuned.

Other Big Data Clouds

We have always been transparent here at Backblaze, including about how much data we store, how we store it, even how much it costs to do so. Very few others do the same. But, if you have information on how much data a company or organization stores in the cloud, let us know in the comments. Please include the source and make sure the data is not considered proprietary. If we get enough tidbits we’ll publish a “big cloud” list.

Andy Klein

Andy Klein

Director of Product Marketing at Backblaze
Andy has 20+ years experience in technology marketing. He has shared his expertise in computer security and data backup at the Federal Trade Commission, Rootstech, RSA and over 100 other events. His current passion is to get everyone to back up their data before it's too late.
Category:  Cloud Storage
  • Oracles

    Yev or Andy, what’s the rough proportion of your energy budget that goes to spinning drives? My wild guess would be that running the solid state stuff would be the biggest chunk (80%???) and the drives themselves are smaller (although in your case every penny per drive counts, what with 10ks of them).

    How efficient are your PSUs these days? I build the workstations for our development group using “platinum” efficiency PSUs ($$$), more to keep the heat and noise in the office down than anything.

    • Andy Klein

      An overwhelming number of our drives are Hard Disk drives, not SSDs. So most of our electricity is for the HDDs. Hard to say exactly how much is for drives versus CPUs, etc as we treat the Storage Pods as a unit for the purpose of electrical consumption.

  • CloudBerry Backup

    Nice! Congratulations on growing so fast, guys!

  • Pingback: The Growth in Cloud Storage Continues: 400 Petabytes of Storage | ExtendTree()

  • Pingback: Backblaze 的資料量 | Gea-Suan Lin's BLOG()

  • Are you offering multi-datacenter solutions? And how do transfer latency and bandwidth compare to s3?

    • Kevin – not yet, but we have that on our roadmap so it’ll be a matter of time.

  • At this rate it would probably be cheaper to produce your own drives.

    • Arman Dezfuli-Arjomandi

      This is such a dumb comment. Don’t you think Amazon and Google would be doing that by now if it was actually cheaper? It turns out that it actually does take an entire company full of people to design and manufacture great, cost-effective hard drives at scale.

      • jsjohnst

        While the rest of your comment was generally correct, did you have to be so insulting to the person you were responding too?

        • Arman Dezfuli-Arjomandi

          Definitely not. Edited for kindness.

      • Oracles

        Plus the capital expenditure required to be able to build that first drive is probably quite large (factory, air filtration, pick-n-place robots, all that stuff). Existing drive manufacturers wrote that off long ago and only need to do incremental changes to existing production infrastructure.

  • Matei-Alexandru Marcu

    Actually- for redundancy reasons, the massive 400 PB total storage is cut down to 230-240 PB tops. That’s the effective space the customer gets. Storage is sad.

    • Stefan Seidel

      The article mentions they use a 17/3 erasure, wouldn’t that mean 400PB equals 340PB effective storage space?

      • Matei-Alexandru Marcu

        That much storage does not only require the HDD-level parity they mentioned. It also requires a hell lot of management VMs, since it cannot be handled by a single mastermind computer- can’t tell the exact proportions. Also, in order to have a redundant disk array, one has to ensure that not only the HDDs are replaceable, but a whole disk array. Thay costs extra space. Taking it to higher 400PB levels means there’s just gonna be hyerarchically bigger arrays/blocks of storage that have to be redundant. Unless the company decides there can be single points of failure, such storage designs will always cost a lot of space for the sake of redundancy.
        The mathematical 17/3 sharding algorhythm does indeed save more than a standard raid, but the game’s rules are pretty much the same I suppose.

  • oregondean

    I too am curious if you have statistics on how many drives are idle at any given time … not so much in a new vault because it’s still being loaded up. But in older vaults there is a combination of customers requesting downloads and customer data being updated due to an older file being deleted or dated. Do you always keep “idea” drives spinning?

    I am also curious about how you manage customer file pointers … you mentioned the split into 20 shards … but over time I’ve got to have data spread across several vaults. The servers managing this must be very busy and the database tables for all of our files must be huge? Some insight into how this works and how you instantiate redundancy might be fun for us to learn.

    • We’ll maybe write about how the software works in the future, but to your question about drives specifically, we’re at the scale now that folks do access data all the time, so even older vaults are always busy in one way or another!

      • Vlad Radu

        Yes, please do share how the software works and also what kind of server infrastructure you use.

  • grandonia

    Oh my, this is exponential growth… if you continue like that you will become the next amazon in less than 10 years!!

  • Zachary J Drummond

    How many customers do you have currently? 400 PTB of data sounds like an incredible amount.

    • jsjohnst

      I have about 1/4,500th of that much storage personally (90TB), so 400PB is a lot, but not as much as you’d think.

      • Daniel Danik

        WOW! How long did it take you to upload 90TB to Backblaze?

  • cjacja

    First off it’s good that you do share information, including costs. It gives customers confidence that your company will be in business years from now.

    One thing I’d like to know is how a vault works. What sits between 1,200 SATA connectors and one (or more) Network cables? What processor is inside the vault, what file system is used.

    My guess is that if there are 1,200 drives in a box and the box only has a few GB of network bandwidth. Each drive is on average must be almost idle even if the network connection is “flooded”.

  • Christian Taylor

    Man, how the heck is Backblaze profitable? I love Backblaze, but I fear that your prices will need to increase sooner or later.

    • sep332

      They’re expanding because they have more customers! And the new customers are storing more data with B2 which charges per-GB instead of a flat rate, so they’re making more money that way too.

    • > Man, how the heck is Backblaze profitable?
      Practice!

    • gavingreenwalt

      Cost of pod 6.0: $0.036/GB
      B2 cost: $0.005/GB mo

      3.6c/0.5c = 7.5 month payback + cost of networking, storage, electricity, internet bandwidth and labor.

    • Joel Venable

      They could probably use dedupe and cut their hard drive requirements in half…

      But then they’d need around 10PB of memory.