Yes, Backblaze Just Ordered 100 Petabytes of Hard Drives

By | October 5th, 2017

10 Petabyt vault, 100 Petabytes ordered, 400 Petabytes stored

Backblaze just ordered 100 petabytes’ worth of hard drives, and yes, we’ll use nearly all of them in Q4. In fact, we’ll begin the process of sourcing the Q1 hard drive order in the next few weeks.

What are we doing with all those hard drives? Let’s take a look.

Our First 10 Petabyte Backblaze Vault

Ken clicked the submit button and 10 Petabytes of Backblaze Cloud Storage came online ready to accept customer data. Ken (aka the Pod Whisperer), is one of our Datacenter Operations Managers at Backblaze, and with that one click he activated Backblaze Vault 1093, which was built with 1,200 Seagate 10 TB drives (model: ST10000NM0086). After formatting and configuration of the disks, there is 10.12 Petabytes of free space remaining for customer data. Back in 2011, when Ken started at Backblaze, he was amazed that we had amassed as much as 10 Petabytes of data storage.

The Seagate 10 TB drives we deployed in vault 1093 are helium-filled drives. We had previously deployed 45 HGST 8 TB helium-filled drives where we learned one of the benefits of using helium drives — they consume less power than traditional air-filled drives. Here’s a quick comparison of the power consumption of several high-density drive models we deploy:

MFR Model Fill Size Idle (1) Operating (2)
Seagate ST8000DM002 Air 8 TB 7.2 watts 9.0 watts
Seagate ST8000NM0055 Air 8 TB 7.6 watts 8.6 watts
HGST HUH728080ALE600 Helium 8 TB 5.1 watts 7.4 watts
Seagate ST10000NM0086 Helium 10 TB 4.8 watts 8.6 watts
(1) Idle: Average Idle in watts as reported by the manufacturer.
(2) Operating: The maximum operational consumption in watts as reported by the manufacturer — typically for read operations.

I’d like 100 Petabytes of Hard Drives To Go, Please

“100 Petabytes should get us through Q4.” — Tim Nufire, Chief Cloud Officer, Backblaze

The 1,200 Seagate 10 TB drives are just the beginning. The next Backblaze Vault will be configured with 12 TB drives which will give us 12.2 petabytes of storage in one vault. We are currently building and adding two to three Backblaze Vaults a month to our cloud storage system, so we are going to need more drives. When we did all of our “drive math,” we decided to place an order for 100 petabytes of hard drives comprised of 10 and 12 TB models. Gleb, our CEO and occasional blogger, exhaled mightily as he signed the biggest purchase order in company history. Wait until he sees the one for Q1.

Enough drives for a 10 petabyte vault

400 Petabytes of Cloud Storage

When we added Backblaze Vault 1093, we crossed over 400 Petabytes of total available storage. For those of you keeping score at home, we reached 350 Petabytes about 3 months ago as you can see in the chart below.

Petabytes of data stored by Backblaze

Backblaze Vault Primer

All of the storage capacity we’ve added in the last two years has been on our Backblaze Vault architecture, with vault 1093 being the 60th one we have placed into service. Each Backblaze Vault is comprised of 20 Backblaze Storage Pods logically grouped together into one storage system. Today, each Storage Pod contains sixty 3 ½” hard drives, giving each vault 1,200 drives. Early vaults were built on Storage Pods with 45 hard drives, for a total of 900 drives in a vault.

A Backblaze Vault accepts data directly from an authenticated user. Each data blob (object, file, group of files) is divided into 20 shards (17 data shards and 3 parity shards) using our erasure coding library. Each of the 20 shards is stored on a different Storage Pod in the vault. At any given time, several vaults stand ready to receive data storage requests.

Drive Stats for the New Drives

In our Q3 2017 Drive Stats report, due out in late October, we’ll start reporting on the 10 TB drives we are adding. It looks like the 12 TB drives will come online in Q4. We’ll also get a better look at the 8 TB consumer and enterprise drives we’ve been following. Stay tuned.

Other Big Data Clouds

We have always been transparent here at Backblaze, including about how much data we store, how we store it, even how much it costs to do so. Very few others do the same. But, if you have information on how much data a company or organization stores in the cloud, let us know in the comments. Please include the source and make sure the data is not considered proprietary. If we get enough tidbits we’ll publish a “big cloud” list.

Andy Klein

Andy Klein

Director of Product Marketing at Backblaze
Andy has 20+ years experience in technology marketing. He has shared his expertise in computer security and data backup at the Federal Trade Commission, Rootstech, RSA and over 100 other events. His current passion is to get everyone to back up their data before it's too late.
Category:  Cloud Storage
  • Homey

    no matter how much I search I just cannot find a source to buy half a doz + drives of 6tb or higher. Seems retailers STILL persist in price fixing and the sheer variation in prices for the same item is staggering. Does the Backblaze masters know of a source to buy quantities of hard drives in a package deal – dare not call it large for 6 to 12 drives of 6tb perhaps higher, at least not after reading how many you lot just ordered (can’t help but wonder just how much YOU paid for each of the 12TB HE Drives

  • Pingback: What’s the Best Solution for Managing Digital Photos and Videos?()

  • Pingback: Endpoint Backup Solutions for the Modern Enterprise()

  • Pingback: Western Digital plans 40TB drives, but it’s still not enough - News Titans()

  • Oracles

    Yev or Andy, what’s the rough proportion of your energy budget that goes to spinning drives? My wild guess would be that running the solid state stuff would be the biggest chunk (80%???) and the drives themselves are smaller (although in your case every penny per drive counts, what with 10ks of them).

    How efficient are your PSUs these days? I build the workstations for our development group using “platinum” efficiency PSUs ($$$), more to keep the heat and noise in the office down than anything.

    • Andy Klein

      An overwhelming number of our drives are Hard Disk drives, not SSDs. So most of our electricity is for the HDDs. Hard to say exactly how much is for drives versus CPUs, etc as we treat the Storage Pods as a unit for the purpose of electrical consumption.

    • Elliott Sims

      I’m not sure on the exact ratios or wattages, but the hard drives themselves are definitely the clear majority of the power draw. They’re small individually, but multiplied by 45 or 60 it adds up. The CPU mostly just shuffles bits around plus some carefully-optimized Reed-Solomon and (AES-NI assisted) SSL, so it doesn’t need to be extremely powerful.

      SSDs would make a huge difference in power, but they’re still something like 5x the price. We’re eagerly awaiting the day (well, year) when they get close enough to spinning-platter prices to be worth it :)

      I don’t think our PSUs are Platinum, but they don’t really need to be: our power draw is relatively steady, so the PSUs can be sized precisely enough to stay around the narrower “optimal” range for gold/silver/bronze.

      • Oracles

        Oops, sorry, I should have been more clear by “solid state stuff” I meant the CPU, RAM, chipset, VRs and all the non-drive parts (didn’t mean to imply SSD). On my builds, the single NVMe SSD probably never uses more than 6-7 W, so maybe 2-3% of total system load when everything is breathing hard. I would imagine with your small CPU/RAM combo and Tons ‘o Drives configuration would reverse the equation.

        • Alec Martin

          If you take the power draw of the drives (provided in the table in the article) and multiply by 60, you get something in the 300-550 watt range, whereas the CPU, ram, etc. should be less than 300W combined for the components used in the storage pod v6. So I’d estimate between 50% and 90% of the power used by a storage pod is used to power drives.

    • Alec Martin

      It turns out to be the other way around, actually (I’m an engineer at Seagate, and I am not speaking on behalf of my employer). Most of the energy dissipated by a mechanical HDD is due to air resistance and spindle motor power electronics inefficiency. The “idle” power usage stats provided in the article are for drives with platters spinning, but no I/O going on. The huge difference between the power usage of helium drives vs. air-filled drives makes this evident: The ST8000*s in the table (6 platters in air, >7W idle) vs. the ST10000NM0086 (7 platters in helium, <5W idle). So more than 80% of the energy is used for spinning disks, and less than 20% running solid state electronics (within the drive itself, not including the host computer's electronics).

  • CloudBerry Backup

    Nice! Congratulations on growing so fast, guys!

  • Pingback: The Growth in Cloud Storage Continues: 400 Petabytes of Storage | ExtendTree()

  • Pingback: Backblaze 的資料量 | Gea-Suan Lin's BLOG()

  • Are you offering multi-datacenter solutions? And how do transfer latency and bandwidth compare to s3?

    • Kevin – not yet, but we have that on our roadmap so it’ll be a matter of time.

  • At this rate it would probably be cheaper to produce your own drives.

    • Arman Dezfuli-Arjomandi

      This is such a dumb comment. Don’t you think Amazon and Google would be doing that by now if it was actually cheaper? It turns out that it actually does take an entire company full of people to design and manufacture great, cost-effective hard drives at scale.

      • jsjohnst

        While the rest of your comment was generally correct, did you have to be so insulting to the person you were responding too?

        • Arman Dezfuli-Arjomandi

          Definitely not. Edited for kindness.

      • Oracles

        Plus the capital expenditure required to be able to build that first drive is probably quite large (factory, air filtration, pick-n-place robots, all that stuff). Existing drive manufacturers wrote that off long ago and only need to do incremental changes to existing production infrastructure.

    • Alec Martin

      Keep in mind that Backblaze’s current purchase rate of about 200 PB/year is 0.04% of the global HDD market (469 EB in 2016). The HDD portion of the R&D budget of each of the 3 companies that make HDDs is many times greater than Backblaze’s total revenue. Western Digital has 80K employees, making it bigger than Nvidia, AMD, GlobalFoundries, Asus, and Facebook…combined.

  • Matei-Alexandru Marcu

    Actually- for redundancy reasons, the massive 400 PB total storage is cut down to 230-240 PB tops. That’s the effective space the customer gets. Storage is sad.

    • Stefan Seidel

      The article mentions they use a 17/3 erasure, wouldn’t that mean 400PB equals 340PB effective storage space?

      • Matei-Alexandru Marcu

        That much storage does not only require the HDD-level parity they mentioned. It also requires a hell lot of management VMs, since it cannot be handled by a single mastermind computer- can’t tell the exact proportions. Also, in order to have a redundant disk array, one has to ensure that not only the HDDs are replaceable, but a whole disk array. Thay costs extra space. Taking it to higher 400PB levels means there’s just gonna be hyerarchically bigger arrays/blocks of storage that have to be redundant. Unless the company decides there can be single points of failure, such storage designs will always cost a lot of space for the sake of redundancy.
        The mathematical 17/3 sharding algorhythm does indeed save more than a standard raid, but the game’s rules are pretty much the same I suppose.

  • oregondean

    I too am curious if you have statistics on how many drives are idle at any given time … not so much in a new vault because it’s still being loaded up. But in older vaults there is a combination of customers requesting downloads and customer data being updated due to an older file being deleted or dated. Do you always keep “idea” drives spinning?

    I am also curious about how you manage customer file pointers … you mentioned the split into 20 shards … but over time I’ve got to have data spread across several vaults. The servers managing this must be very busy and the database tables for all of our files must be huge? Some insight into how this works and how you instantiate redundancy might be fun for us to learn.

    • We’ll maybe write about how the software works in the future, but to your question about drives specifically, we’re at the scale now that folks do access data all the time, so even older vaults are always busy in one way or another!

      • Vlad Radu

        Yes, please do share how the software works and also what kind of server infrastructure you use.

  • grandonia

    Oh my, this is exponential growth… if you continue like that you will become the next amazon in less than 10 years!!

  • Zachary J Drummond

    How many customers do you have currently? 400 PTB of data sounds like an incredible amount.

    • jsjohnst

      I have about 1/4,500th of that much storage personally (90TB), so 400PB is a lot, but not as much as you’d think.

      • Daniel Danik

        WOW! How long did it take you to upload 90TB to Backblaze?

      • Vlad Radu

        Pornhub has more than that.

        • jsjohnst

          @disqus_lFGJLBxQdt:disqus I’m sure a bunch of folks have an orders of magnitude more than I do. This wasn’t a d#$% measuring context, I gave the example to illustrate why something might not be as big as it seems.

  • cjacja

    First off it’s good that you do share information, including costs. It gives customers confidence that your company will be in business years from now.

    One thing I’d like to know is how a vault works. What sits between 1,200 SATA connectors and one (or more) Network cables? What processor is inside the vault, what file system is used.

    My guess is that if there are 1,200 drives in a box and the box only has a few GB of network bandwidth. Each drive is on average must be almost idle even if the network connection is “flooded”.

  • Christian Taylor

    Man, how the heck is Backblaze profitable? I love Backblaze, but I fear that your prices will need to increase sooner or later.

    • sep332

      They’re expanding because they have more customers! And the new customers are storing more data with B2 which charges per-GB instead of a flat rate, so they’re making more money that way too.

    • > Man, how the heck is Backblaze profitable?

    • gavingreenwalt

      Cost of pod 6.0: $0.036/GB
      B2 cost: $0.005/GB mo

      3.6c/0.5c = 7.5 month payback + cost of networking, storage, electricity, internet bandwidth and labor.

    • Joel Venable

      They could probably use dedupe and cut their hard drive requirements in half…

      But then they’d need around 10PB of memory.