Our 6 TB Hard Drive Face-Off

December 15th, 2014

blog-6tb-drive-compare

Backblaze is transitioning from using 4 TB hard drives to 6 TB hard drives in the Storage Pods we will be deploying over the coming months. With over 10,000 hard drives needing to be purchased over the next several months, the choice of which 6TB hard drive to use is critical. Let’s take a look at how we’re navigating this transition.

Getting Started
We started the process in September when we purchased and deployed our first 6 TB hard drives. We purchased Western Digital (WD60EFRX) and Seagate (STBD6000100) hard drives. We deployed two Storage Pods, each with 45 drives from each brand. The Western Digital Storage Pod was designated UL796 and the Seagate Storage Pod was designated UL800. Each Pod was identical in its design and configuration except for the hard drives used.

Set Up
In a previous post we described how we set up and load tested our Storage Pods. Each of the 6 TB-drive Pods passed their set up and load testing without incident and were deployed, first the Western Digital Pod and a few days later the Seagate. For comparison purposes we also deployed Storage Pod UL838, which had 4 TB HGST drives installed.

Into the Fire
Backblaze currently receives 130 TB of data from our customers to store each day. Data arrives in similarly sized encrypted blocks of data. On any given day, there are 20-40 Storage Pods accepting these data blocks as they arrive. As a block arrives it is passed to a Storage Pod, if that Pod is busy, the data block is passed to the next Pod in line. Over the course of each day this results in all the available Storage Pods being given the same opportunity to accept data at the same rate.

130,000 GB
The amount of customer data Backblaze stores each day.

Newly installed Storage Pods accept data for the most part unencumbered until they reach 80% full. At that point a Storage Pod will reduce the amount of data it receives each day. New Storage Pods come on line on a regular basis so that arriving data always has a place to go without delay.

When UL796, UL800 and UL 838, came online we recorded various statistics with regards to how much data they loaded. Here’s a sample of the relevant storage data collected on one of those days.

Date Pod ID Data Stored Free Space Total Space Status
9/17/2014 UL796 48.87 TB 162.27 TB 211.14 TB Active
9/17/2014 UL800 29.16 TB 181.98 TB 211.14 TB Active
9/17/2014 UL838 6.81 TB 133.95 TB 140.76 TB Active

The amount of data each of these Storage Pods received each day of the period of observation is shown below. We stopped recording when a given Storage Pod reached 80% full.
blog-stats-6tb-days

blog-stats-6tb-inservice

Since the three Storage Pods did not start accepting data on the same day, there could be some differences in data stored by day. The following chart aligns the three Storage Pods over the same calendar days, e.g. Day 1 is Sep. 19th, Day 2 is Sep. 20th, etc.

blog-stats-6tb-days-aligned

Observe that the different Storage Pods filled up at different rates. Here is how quickly each Pod reached 80% capacity.

Pod ID Drive Brand 80% Full Total Space
UL796 Western Digital 33 days 169 TB
UL800 Seagate 42 days 169 TB
UL838 HGST 23 days 113 TB

This translates to the following average data stored per day for each Storage Pod.

Pod ID Drive Brand Data Loaded
UL796 Western Digital 5.12 TB/day
UL800 Seagate 4.02 TB/day
UL838 HGST 4.91 TB/day

The Western Digital hard drive loaded data faster than the Seagate and even edged out the HGST.

Evaluation
Let’s review the Seagate and Western Digital drives so far:

  1. Initial reliability (how many drives failed) – No failures.
  2. Running reliability (3 months) – No failures
  3. SMART Stats (3 months) – No error conditions recorded for the 5 stats that we utilize.
  4. Hard Drive Cost – about the same.
  5. Energy Use – The Seagate drives were 7200 rpm and used slightly more electricity than the Western Digital drives which were 5400 rpm. This small difference adds up when you place 45 drives in a Storage Pod and then stack 10 Storage Pods in a cabinet.
  6. Loading speed – Edge to Western Digital, by a little over 1 TB per day on average.

Our goal is to find hard drive models that are reliable and cost effective in our environment. The “One Pod” test has proven over the years to be a good starting point to eliminate those hard drives models that are obviously incompatible with our environment.

Next Step: Scaling the Test
Based on the results, we have ordered 230 Western Digital drives to fill 5 Storage Pods (with 5 spare drives). These will be installed, load tested and deployed shortly. Assuming the Western Digital drives continue to perform as well in the 5 Storage Pods, we’ll move forward with using the Western Digital 6TB drives in our Storage Pods over the coming months.

Is Seagate Shutout?
The Seagate 6TB drives performed well, albeit they loaded a little slower. They also use a little more electricity than we’d like. Still they are not shut out. We really like to have multiple qualified hard drives to order and use in our Storage Pods. Diversification is good. To that end we expect to order additional Seagate 6 TB hard drives over the coming months, build them into Storage Pods and monitor how they perform. Assuming they perform well, the 6 TB Seagate hard drives would be added to our list of qualified drives.

Victory is Fleeting
Today the Western Digital hard drives are first in line to be our choice for 6 TB drives. Of course, we just ordered 45 HGST 8 TB Helium hard drives for testing. Their unit price is still a too high for cost effective deployment, but thanks to the return of a relentlessly decreasing hard drive pricing curve, it could be just a matter of time. In fact, Seagate is now beginning to ship their 8 TB shingled magnetic recording (SMR) hard drives for a reported $260 a drive. Availability is spotty at the moment, but is certain to improve over the coming months. Using either the HGST or the Seagate 8TB drives means a 360 TB Storage Pod is imminent – we can hardly stand the wait…

 

Andy Klein

Andy Klein

Andy has 20+ years experience in technology marketing. He has shared his expertise in computer security and data backup at the Federal Trade Commission, Rootstech, RSA and over 100 other events. His current passion is to get everyone to back up their data before it's too late.
  • Rick Peralta

    > Backblaze currently receives 130 TB of data from our customers to store
    each day… there are 20-40 Storage Pods

    WOW! That is about 1.5 Gbps or 100 Mbps per pod.

    Sort of shifts the notion of performance!

    Do you publish peak I/O numbers to the pods?

  • Jon

    Very informative. Nothing short of excellent!

  • A very comprehensive comparison between Western Digital and
    Seagate 6 TB hard drives (hårddiskar) you’ve made! It would help businesses in
    making their hard drive purchasing decisions. With the price decreasing, it is
    certainly good news for all! Thanks Andy!

  • Jon

    Thanks for sharing this data.

    I’m looking forward to hearing about the Seagate Archive HDD 8TB. They look tempting for my needs, but wondering where the weak points really are. I’d prefer a 8TB WD Red, but I’m not that brand loyal when it comes down to it.

  • Mary Lee

    You know, if my parents (God rest their souls) were still alive and could read these comments, they’d think everyone had a little too much whiskey and were speaking gibberish.

  • handleym

    Please do what you can to clarify the strengths and weaknesses of the SMR Seagate drives.
    This is the biggest change to HD characteristics in years, and I think we’re all curious about how well it REALLY behaves
    – for streaming writes
    – for lots of small writes
    – for reads
    – for reliability (does constant rewriting driven by small writes lead to higher temperature or mechanicals that die faster?)

    I think we’re all REALLY looking forward to that post.
    (Helium 8TB is less interesting, IMHO — just more of the performance characteristics we’ve seen at 4TB and 6TB, and for the past ten years or so, with, presumably, the usual HGST reliability.)

    • GrandpaReindeer

      I think BB only writes large blocks (with a few reads).

      People comparing BB to other systems need to remember that BB primarily writes users data with a few retrievals (in no special hurry). It is NOT a typical system with lots of rapid random reads and writes all day long.

  • bbgooble

    Buy more bandwidth already, jesus. Your service is shit if you can only backup data at less than 300KB/sec.

    • dgw

      Tentatively (I just installed the client and it’s still scanning my drives), ^ this.

      Backblaze is accepting an average of about 1.5GB/s per second according to these stats, but the client says that my backup should run at about “2 GBytes/Day”. So I get about 1.3 seconds’ worth of transmission time per day? Seems low, unless it’s a deliberate plan to keep users from backing up too much data in their “unlimited” backup plans.

      • GrandpaReindeer

        I believe the automatic throttling is more because ISP internet upload speeds are MUCH slower than downloads and BB doesn’t want to overload internet connections. The only time it’s really a problem is the first backup or if one did a LOT of changes, otherwise, the regular incremental backups run very fast. (Friends with Carbonite report a similar 1-2+ day first backup.)

        • dgw

          1-2 days? I guess most people have much less than 3TB in their initial set. At any rate, my connection can handle at least 4MB/s up and 10MB/s down, so cranked up to maximum performance it’s estimating 30GB/day or so now. That’s better; guess the initial estimate is conservative.

    • You can increase the amount of threads a backup process uses — I’ve increased it to 10 & was uploading @ 300+ Mbps. Backblaze wasn’t the choke point in my case.

  • IJK

    Looked into Backblaze. They only seem to have a Windows .exe file as client. Went to CrashPlan. They have a Linux client, it’s monthly fee is a bit less than Backblaze’s, and it works like charm. It’s a shame Backblaze did not want my money.

    • dakishimesan

      BB has a Mac client too. CP uses java, making it cross-platform but much more cpu intensive than BB.

      • dgw

        I just cashed in the remaining time on my CrashPlan subscription and installed Backblaze for the very reason that the Java client often brought my system to its knees, both during backup (I/O & CPU) and during scanning (mostly I/O).

        • dakishimesan

          Agree, there is no comparison. A backup program should work like back place, it should operate almost completely silently in the background with very little resource use.

          • GrandpaReindeer

            I do get some momentary lags (about 20-30 secs) when my system appears to hang. I’ve found it’s when Backblaze first starts checking files, preparing for a backup. As long as I know what it is, I can live with a 30sec pause every two+ hours. If I don’t want to be interrupted, I tell BB to backup and then pause it…good for two hours.

  • Randall Edo. Badilla

    Great post! but hardly mention “Energy Use”… do you have figures on the pod, idle, running or even on energy saver mode?? maybe Kw/h for every pod? anyway thanks!!!

  • Multimediavt

    Odd. The Seagate drive has a higher spindle speed and twice the cache memory while having less data take up than the WD drive. That shouldn’t happen, theoretically. I know you guys don’t pull fast ones with data so I am really interested in finding out why a drive with significantly better specs is being bested by a lower spec’d drive. I want to think this has something to do with the software side of your setup not managing the larger cached drives well but really am throwing darts in the dark.

    • Multimediavt

      Bump.

      Anyone from Backblaze able to respond to my post?

      • Andy Klein

        We saw the “anomaly” early on and tried to figure it out, but there was nothing obvious. That said, the ability to accept either 4 TB/day or 5TB/day worked for us as both drives accepted data without creating a choke point. We’ll look into it further as we bring more Seagate drives online to see if this is consistent or not. For us the power consumption was the main reason we went forward with WD, but we’ll keep working with the Seagate 6TB drives as a potential second supplier.

    • mayhempk1

      I know this is a late reply, but there’s a LOT more to hard drives than just their RPM and cache. RPM is mainly for random I/O, whereas density can vary a lot between two brands of drives.

  • Danny Williams

    The technology for the 8tb is cheaper. They say the new Seagate’s are going to retail for $260.

  • zgirod

    Out of curiosity, what is it about the Western Digital that makes it at 5400 rpms load data faster than the Seagate at 7200?

    • sc3pilot

      Better NCQ algorithms

    • B Brad

      Keep in mind bandwidth and RPM are independent. 7200 rpm helps random I/O, but does nothing for bandwidth. Bandwidth is RPM * linear density. Both density and RPM can vary. There can be bottlenecks elsewhere as well, handling multiple outstanding transactions, caching and buffering, and being able to cache more than a full track can have large effects.

      • Multimediavt

        The Seagate drive also has twice the cache (128MB). That should even out or best the linear density issue. I still wonder if it’s a drive thing or a software accessing the drive thing. Gonna poke around at other review and info about those particular model drives while waiting to see what someone at Backblaze responds with … if they respond.

        • user4321

          From the way I read the article, it just appears the WD pod was first in line so got the data first…doesn’t really give any indication as to the performance of the drives.
          Maybe I just need to read the article again, but that is how it appears to me.

          • Multimediavt

            I did go back and look again and it’s hard to tell if that’s the case. The problem is there are a number of things it could be, but all the reviews I’m finding show the Seagate drives performing better in a NAS config. It could also be a hardware or firmware issue with that model Seagate drive creating more latency. It’s just weird and oddities create curiosity. :)

          • Kyle

            These graphs aren’t apples to apples by any means. Backblaze just likes finding statistics that make Seagate look bad, since none of the Seagate drives failed they had to find another graph.

  • Ryan Roberts

    What are you doing with the old 4GB drives? Want to sell a few?

    • Brandon Bennett

      Hopefully they shred them to avoid any leak of customer data (encrypted or not)

      • Multimediavt

        Personally, I prefer a drill press and a tungsten bit. Doesn’t take long to press a few holes in the platters. There’s also the extreme method of removing the platters and either shattering them (if glass) or crushing them (if metal). The drill often shatters the glass platters anyway.

        • sc3pilot

          That only destroys data where the holes are and the immediate vicinity around them.

          • Bump77

            drill 2 holes in em and throw em in a bucket with saltwater.
            do not need more than 2 weeks in it and not even the best copy-rats
            will get anything out of em

          • bobjr94

            My uncle use to work at a bank, they use to have people bring the drives home and drill several holes thogh the drive in their garages. Very effictive but the problem was people driving around with bank hard drives in their cars.

          • mike_s123

            No one has ever demonstrated the ability to retrieve useful information off such a platter.

          • Multimediavt

            I hope that was meant to be funny, cuz it made me laugh.

          • dakishimesan

            “That only destroys data where the holes are and the immediate vicinity around them.”

            That’s actually probably not the case — data on rotary drives is stored in sequential spiral rings, so piercing the rings would make it almost impossible to restore the bits in any coherent order, except for full undamaged ring sections of the platter.

          • GrandpaReindeer

            Also seems like the holes would either catch the heads or destroy the airflow that keeps them slightly off the disks, causing severe head crashes. Seems it would be pretty hard to get a drive to negotiate around and NEVER cross over a hole.

        • Jesper Monsted

          I’ve been eyeing an electric wood cleaver to permanently destroy the platters.

          • loopyduck

            Let’s just skip ahead to thermite.

          • stretch.kerr

            Thermite is underwhelming, actually. They look cool, but didn’t melt the platters as expected. If you have the time, dismantle the drives, THEN thermite the platters. If you thermite the whole drive, a lot of the thermite is used getting through the shell, and/or the PCB, depending on what way up the drive is when you pull the pin on the thermite. https://plus.google.com/photos/110686910997756003632/albums/5636795932587529697?authkey=CNjg8ZbrnsXV-QE

    • GrandpaReindeer

      I believe the old drives are kept in service until they die. The new drives are used for new data from customers old and new.

  • maktt

    Linux client already!

    • Multimediavt

      Having looked at the service I now see the point of your post. Yes, some other client OS software other than Windows would be nice.

  • tmikaeld

    How do you handle Wester Digital ‘Intellipark’ feature that constantly park the head every 8 seconds, causing them to fail early?

    http://www.instantfundas.com/2011/12/intellipark-makes-western-digital-green.html

    And yes, the WD RED also have the ‘Intellipark’ feature.

    • loxposax
      • Nicholas Oldroyd

        IntelliPark was ON by default on the WD RED I bought. I had to turn it off from the 8 seconds it was set to with the WD idle tool.

        • Mr.Burns

          How do you check that and how do you change that?

          • Buchan Milne

            Under linux/bsd:

            # smartctl -A /dev/sdb|grep -E ‘^ *(4|193|12)’
            4 Start_Stop_Count 0x0032 100 100 000 Old_age Always – 80
            12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always – 80
            193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always – 14

            The value (in the last column) for Load Cycle Count should be the same order of magnitude as the Power_Cycle_Count and Start_Stop_Count values.

            The above was from a 3TB WD “RED”, which doesn’t seem to have this problem:

            # smartctl -i /dev/sdb|grep Model
            Model Family: Western Digital Red (AF)
            Device Model: WDC WD30EFRX-68AX9N0

            It’s raid partner looks basically identical (both on about 14000 Power_On_Hours).

            See more discussion at http://forums.freenas.org/index.php?threads/hacking-wd-greens-and-reds-with-wdidle3-exe.18171

        • Phillip

          I assume you used the wdidle3 to turn it off? Newer Red drives are supposed to come with newer firmware that doesn’t park it as often. You can download a firmware updater at http://support.wdc.com/product/download.asp?groupid=619&sid=201&lang=en

          I bought a bunch of the 6TB drives when they came out and I didn’t want to run wdidle3 on them as there are warnings not to. So I didn’t and my current power on hours is 2416 it only has 1167 head parks. With it rated at 600000 I say its not really an issue anymore as long as they have the newer firmware.

          • Nicholas Oldroyd

            I used http://idle3-tools.sourceforge.net/ to turn it off for my 4TB Red.

            My Red saw over 20,000 cycles in 1 month before I realized it was a problem. Now it’s lucky to see 1 cycle a week or so.

  • rf

    I suppose you guys aren’t like many workloads in terms of how SMR would work for you, but I’d love to hear anything (measurements, impressions–really anything) you guys have to say on it when you try it, since word from people who actually have it deployed (on whether it was fussy to work with, what [if anything] you had to do differently versus with regular drives, …) is pretty thin on the ground.

  • B. Guyas

    Fascinating post! You’re receiving data at 1.5 GB/s. At one point I remember you saying you have two 10gbps links which comes out to 2.5 GB/s so your connection isn’t quite the limiting factor (unless overhead impacts throughput.). It’s mind blowing how many individual encrypted data packet your pods receive per second let alone per day!

    • greenlight

      1.5 GB/s would be the average – there are probably peaks during work hours and valleys during night.