One Billion Drive Hours and Counting: Q1 2016 Hard Drive Stats

May 17th, 2016

Q1 2016 hard Drive Stats

For Q1 2016 we are reporting on 61,590 operational hard drives used to store encrypted customer data in our data center. There are 9.5% more hard drives in this review versus our last review when we evaluated 56,224 drives. In Q1 2016, the hard drives in our data center, past and present, totaled over one billion hours in operation to date. That’s nearly 42 million days or 114,155 years worth of spinning hard drives. Let’s take a look at what these hard drives have been up to.

Backblaze hard drive reliability for Q1 2016

Below are the hard drive failure rates for Q1 2016. These are just for Q1 and are not cumulative, that chart is later.

Q1 2016 Hard Drive Stats

Some observations on the chart:

  1. The list totals 61,523 hard drives, not 61,590 noted above. We don’t list drive models in this chart of which we have less than 45 drives.
  2. Several models have an annual failure rate of 0.00%. They had zero hard drive failures in Q1 2016.
  3. Failure rates with a small number of failures can be misleading. For example, the 8.65% failure rate of the Toshiba 3TB drives is based on one failure. That’s not enough data to make a decision.
  4. The overall Annual Failure Rate of 1.84% is the lowest quarterly number we’ve ever seen.

Cumulative hard drive reliability rates

We started collecting the data used in these hard drive reports on April 10, 2013, just about three years ago. The table below is cumulative as of 3/31 for each year since 4/10/2013.

Cumulative Q1 2016 Hard Drive Failure Rates

One billion hours of spinning hard drives

Let’s take a look at what the hard drives we own have been doing for one billion hours. The one billion hours is a sum of all the data drives, past and present, in our data center. For example, it includes the WDC 1.0TB drives that were recently retired from service after an average of 6 years in operation. Below is a chart of hours in service to date ordered by drive hours:

Q1 2016 Hard Drive Service Hours

The “Others” line accounts for the drives that are not listed because there are or were fewer than 45 drives in service.

In the table above, the Seagate 4TB drive leads in “hours in service” but which manufacturer has the most hours in service? The chart below sheds some light on this topic:
Hard Drive Service Hours by Manufacturer

The early HGST drives, especially the 2- and 3TB drives, have lasted a long time and have provided excellent service over the past several years. This “time-in-service” currently outweighs the sheer quantity of Seagate 4 TB drives we have purchased and placed into service the last year or so.

Another way to look at drive hours is to see which drives, by size, have the most hours. You can see that in the chart below.
Hard Drive Service Hours by Drive Size

The 4TB drives have been spinning for over 580 million hours. There are 48,041 4TB drives which means each drive on average had 503 drive days of service, or 1.38 years. The annualized failure rate for all 4TB drives lifetime is 2.12%.

Hard Drive Reliability by Manufacturer

The drives in our data center come from four manufacturers. As noted above, most of them are from HGST and Seagate. With that in mind, here’s the hard drive failure rates by manufacturer, we’ve combined all of the drives, regardless of size, for a given manufacturer. The results are divided into one-year periods ending on 3/31 of 2014, 2015, and 2016.
Hard Drive Failure Rates by Manufacturer

Why are there less than 45 drives?

A couple of times we’ve noted that we don’t display drive models with fewer than 45 drives. Why would we have less than 45 drives given we need 45 drives to fill a Storage pod? Here are few of the reasons:

  1. We once had 45 or more drives, but some failed and we couldn’t get replacements of that model and now we have less than 45.
  2. They were sent to us as part of our Drive Farming efforts a few years back and we only got a few of a given model. We needed drives and while we liked using the same model, we utilized what we had.
  3. We built a few Frankenpods that contained drives that were the same size in terabytes but had different models and manufacturers. We kept all the drives in a RAID array the same model, but there could be different models in each of the 3 RAID arrays in a given Frankenpod.

Regardless of the reason, if we have less than 45 drives of the same model, we don’t display them in the drive stats. We do however include their information in any “grand total” calculations such as drive space available, hours in service, failures, etc.

Buying drives from Toshiba and Western Digital

We often get asked why we don’t buy more WDC and Toshiba drives. The short answer is that we’ve tried. These days we need to purchase drives in reasonably large quantities, 5,000 to 10,000 at a time. We do this to keep the unit cost down and so we can reliably forecast our drive cost into the future. For Toshiba we have not been able to find their drives in sufficient quantities at a reasonable price. For WDC, we sometimes get offered a good price for the quantities we need, but before the deal gets done something goes sideways and the deal doesn’t happen. This has happened to us multiple times, as recently as last month. We would be happy to buy more drives from Toshiba and WDC, if we could, until then we’ll continue to buy our drives from Seagate and HGST.

What about using 6-, 8- and 10TB drives?

Another question that comes up is why the bulk of the drives we buy are 4TB versus the 5-, 6-, 8- and 10TB drives now on the market. The primary reason is that the price/TB for the larger drives is still too high, even when considering storage density. Another reason is availability of larger quantities of drives. To fill a Backblaze Vault built from 20 Storage Pod 6.0 servers, we need 1,200 hard drives. We are filling 3+Backblaze Vaults a month, but the larger size drives are hard to find in quantity. In short, 4TB drives are readily available at the right price, with 6- and 8TB drives getting close on price, but still limited in the quantities we need.

What is a failed hard drive?

For Backblaze there are three reasons a drive is considered to have “failed”:

  1. The drive will not spin up or connect to the OS.
  2. The drive will not sync, or stay synced, in a RAID Array (see note below).
  3. The Smart Stats we use show values above our thresholds.

Note: Our stand-alone Storage Pods use RAID-6, our Backblaze Vaults use our own open-sourced implementation of Reed-Solomon erasure coding instead. Both techniques have a concept of a drive not syncing or staying synced with the other member drives in its group

A different look at Hard Drive Stats

We publish the hard drive stats data on our website with the Q1 2016 results there as well. Over the years thousands of people have downloaded the files. One of the folks who downloaded the data was Ross Lazarus, a self-described grumpy computational biologist. He analyzed the data using Kaplan-Meier statistics and plots, a technique typically used for survivability analysis. His charts and analysis present a different way to look at the data and we appreciate Mr. Lazarus taking the time to put this together. If you’ve done similar analysis of our data, please let us know in the comments section below – thanks.

Andy Klein

Andy Klein

Andy has 20+ years experience in technology marketing. He has shared his expertise in computer security and data backup at the Federal Trade Commission, Rootstech, RSA and over 100 other events. His current passion is to get everyone to back up their data before it's too late.
  • Max

    In a long period of time all drives even the most reliable will break. All values are scrambled. This chart makes no sense.

  • William Atkinson (TheComputerG

    Seagate drives: I expected failure rated like that based on the high(96%+) failure rate of these drives. I have a Seagate about to die, trying to get data off of it and it’s quite slow. Errors left and right, and SMART stats are worrying.

  • Everett Troya

    So with the data above what would you say is the number one drive that is on the market to date?

  • Bron Fieldwalker

    Given your last enterprise test was 3 years ago, wondering if you have any more done? Also in talking to a certain SAN vendor they advised me to stick to SAS drives as they were all in general higher quality than SATA. New Enteprise drives also have things like better raid rebuild processes and times etc have you experienced anything related to that on your Raid6 boxes.

  • Dennis Ng

    Given there is no data privacy here, is it possible to post raw data to do some statistical analysis?

  • Sometimes I have a hard time believing these really low failure rates, as I get way higher failure rates than that… more like 70-80% of drives die on me and it makes me think manufacturers are hiding the real numbers. 4 Out of 5 WD black 2tb’s died on me within 2 years, they were a terrible drive, I also had a WD black 500gb die on me, and another slightly older 250Gb HDD in my wifes PC dief too trvrnylu, and a 500Gb seagate laptop drive just died too today, I have a hard time keeping up because so many drives dropping like flies lately. Slowly replacing OS drives with SSD’s is all I can do.

  • Most excellent article!!!!

  • a b

    One Question: Unlike all other HDD manufacturers, why doesn’t Toshiba list the drive’s spec for: MTBF and Load/Unload values for their X300 drives ? I’ve attempted to obtain this info from Toshiba Japan as well as their international affiliates, all of which confirm this info is not disclosed by Toshiba. When asked why, their response was they feel “this specification is not necessary for consumer HDD’s”. I am hopeful if several folks inquire they’ll release the spec. As it stands this non-disclosure is quite concerning. If the MTBF for X300’s is say 500K and the load/ unload is 100K…WOW this would be a huge concern considering the present norm is 1,000,000 and 300K respectively. Pls express your concerns or forward your inquiry directly to Toshiba: answer@webcom.toshiba.co.jp

  • Elmer Gloo

    This is very helpful information. Thanks!

  • 한대희

    Hi, I’m currently implementing survival analysis on 2016 Q3 data. What bothers me is that some drives do not have record even though they do not have failure before.
    For example, by considering only the indicator for failure, some drive has
    07-01 07-02 07-03 07-04 07-05 07-06
    0 NA NA NA 0 1

    Does these records are only the matter of records? (Like missing data of S.M.A.R.T data) Or is there any other reason for missing data? (Like scheduled maintenance)

    Thank you

  • Wireball

    Perhaps you could star the percentage failure rates that are uncertain. E.g.: Toshiba 3TB drive – 8.63%*

    (I believe there are also ways to calculate statistical significance, but I am not a statistician.)

  • I am wondering what is so wonderbar about this Look at the chart, do the math. Pic any drive I will use the top as the example. There was a total of 19 drive failurse, sure that sounds not bad, OH but wait, thats in less then 100 days…now that is a WHOLY CRAP now isn’ it.

  • AZZAZ RACHID

    hellos. I am a PhD student in Statistics at Hight National School of Statistics and Applied Economics-Algeria-.I’ve been following your works since 2015, i always read your updates . because My research theme is on the statistical models in reliability. where the practical part of my thesis deals with the application of a mixed model on HDD reliability Data in order to get more modeling flexibility , so I have prepared your data by serial numbers in order to follow every hard drive separately . Note i want to use your data, but I need a written permission from your laboratory. This result as a request of my supervisor who insist on the source and reliability of the data used in my research. So I have the honor to ask you to help me by sending me an authenticate authorization which conclude the permission and notes on the reliability of your data.i will send you all results that i will reach it , and a copy of my phd thesis when i finish it.
    thank you for your helps ,
    Thank you for your good work
    warmest greetings from me and my supervisor.

    • Have you completed this yet? I would love to see your findings.

  • solars

    Based on one of these reviews I bought two HGST 0S03665 4TB harddrives instead of WD. One of them failed after 8 months, the other one after 3 weeks.

    Not sure how this adds up to the failure rate posted here..

    • Dilldeezy

      Statistical data your actual experience.

  • Dear Andy ,
    great information you have provided about the hard disk. i was searching something like this.

    • Buddy

      unsubscribe

    • Buddy

      “unsubscribe”

  • cataria

    i got a question, why do the hms5c4040ble640 4tb from hgst have a massive 20,29 annualized failure rate in the 3/31/2014 (1 year) and then go to below half a percent being among the top hdds the next years?
    i see 494 drives were used that year, which is not a to small amount to have this as a coincidence.
    am i reading the charts wrong?
    i’m using 2 of these hdds myself, so quite curious.

  • Nigel Fisher

    Multi-axial reports would be nice:

    http://i.imgur.com/VMvyW7C.png

  • Kasun Rajapaksha

    I’m going to buy your service in next month. Also going to buy a HGST 4 TB drive to replace my Toshiba 500GB currently using and going to fail soon. My PC typically runs 20 h/day. so your stats matches me very well. Thanks a lot.

  • Sam Iles

    Had to say it again…Thank You guys for sharing all this data, great stuff, keep up the good work

    :-D

  • traumadog

    Personally, I agree with the Ross Lazarus analysis of the data – and for an individual looking to purchase a solitary hard drive or two, the Kaplan-Meier survival plot gives a much better indication of how any one particular drive maker or model will perform from time zero than the “annualized failure rate” calculations you guys did.

    Regardless, props for having one of the few – if only – publicly available data on hard drive longevity out there.

  • kbadk

    I know I’m late to the party here, but how come you’ll “continue to buy our drives from Seagate and HGST” when Seagate seems to fail almost 10 times as often as HGST? The numbers just scream HGST to me. Am I missing something?

    • Scott

      In a previous blog they described how it is still more cost effective to buy the cheap seagate drives and replace them when they fail than to buy the more expensive drives, (and they’re more readily available in large quantities)

      • kbadk

        I see. Thank you. I’ll be ordering some HGSTs then.

  • douglas deodato

    nice blog, just a small suggestion, instead images for the list, why not use text to be more searchable on google? thanks for the tips.

  • Please post real tables – not images. This makes it impossible to copy-paste model names.

  • Ethan Beaver

    What diagnostic / monitoring tools do you use to anticipate a hard drive failing? Thanks.

  • Kaname Fujiwara

    A well done public service! Thank you!

  • someReader

    Hi, are you going to release hard drive reliability for Q2 2016? Regards

  • Amazing! many thanks

    Would be glad to see failure rate vs life time (even if it’s more or less) – would be very helpful in picking the right hard drive

  • Manjunath J Hassan

    This is very useful. Thanks a lot.
    I am doing a project where i am developing SAS disk failure predictive algorithm.
    Do you anything written on SAS Disks. Currently i am using scsilog pages to analyze the behavior of the disks.( I have some 300000 SAS Disk samples)

  • Dahc Renrut

    Do you use any SSD drives or are all of the drives traditional mechanical drives? Im guessing that it is not cost effective, YET, to go SSD.

  • Itechstorm

    What a great service you guys provide to the community! Can’t thank enough. Currently subscribing to your service. Happy customer here.

  • Buddy

    So. Out of all of this data, some extensive, some not, are you ready to draw conclusions about reliability? Should we all be buying HGST drives when given the chance?

    • Dilldeezy

      I don’t interpret the data that way at all, and I think in several of his blog posts he has stated the same thing. As a whole, they have balance reliability with labor cost of swapping out bad drives, the raw costs of the drives {Seagate can be bid out at significantly lower costs}, and the availability of the drives in bulk {they buy 10,000+ drives at a time, but can’t always get them from Toshiba and HGST}.

      • Buddy

        unsubscribe

  • Thank you for this! Can you provide the Cumulative hard drive reliability rates in a downloadable spreadsheet so we may more easily sort them in various ways to help make drive purchasing decisions?

  • Yuefeng Gao

    Thanks for keep publishing the hard drive failure data. Did you receive any pressure from the hard drive manufacturers and ask you not to compare with other products?

  • John Doe

    I leave my desktop computer on almost 24/7. Which HDD would work best? Would the HGST Enterprise disk be best suited for a desktop that is left running? If not, which one should I get?

  • Frank Underboob

    After having a second Seagate drive fail on me with no warning whatsoever, I really I’d had this data available to me a year ago. :(
    Thanks so much for publishing it – my new HGST drive will hopefully as reliable for me as they were for you.

    • Calvin Dodge

      The first such Backblaze post convinced me to mirror all of my Seagate drives with other brands. A couple of months later, one of those Seagates failed. Fortunately, the mirror drive was fine, so I ditched the dead drive and rebuilt the array with another HGST.

  • Môùàd X Pàlómå

    59

  • ELPCU

    Hmm, Thanks for valuable information, it is very helpful. very very bery berry, I mean.

    By the way, it is a bit disappointing to not see “average age” of the drives in the chart.

    I am very interested to see average age, especially age of failed drives. :P

    • Andy Klein

      We’ve included average age in the charts of some of the prior posts: you can find links to all those posts here: http://www.backblaze.com/b2/hard-drive-test-data.html. I vary what goes into each post a little bit each time. I’ll include average age the next time…

  • sadsongs

    Thank you for posting this. I think it has potential to be a great service to the world of hard drive buyers! But— Like the report from 2014, skimming, the statistics here are limited, and I don’t know how anyone in the public can even get any kind of useful analysis conclusion without more data.

    Mixing old and new drives of the same model into the batch confuses things.

    And, it is simply unfair to compare a 5 year-old drive to a 1-year-old drive! Of *course*, the older the drives, the higher the failure rate.

    What is needed is more information on drive ages, and a coherent presentation of that data.

    How long drives have been running is crucial, but the way that information is listed leaves one baffled.

    It didn’t seem that the 2014 data was properly analyzed, and so an “annualized” rate here may or may not be trusted.

    If you aren’t willing or able to provide the needed stats for someone here to clearly analyze, how do we know your logic is being applied properly to your own analysis, regardless of how long you’ve been doing this or how many drives you’ve installed?

    I only looked at the tables here (vs. reading all the text) to see if I could analyze the info, but, again, not enough info is there for me to make heads or tails of it.

    The following statement in a numbered note here is, IMO, misstated: “Failure rates with a small number of failures can be misleading. For example, the 8.65% [but in 8.63% the tables, I guess] failure rate of the Toshiba 3TB drives is based on one failure. That’s not enough data to make a decision.” Until I compared the statement to the tables, I was simply shaking my head. What it *should* say instead is that compared with so many *other* drive models they have, “Failure rates with a small number of DRIVES can be misleading.” That makes sense.

    At least the 2014 page gave us years that certain models have been in service (though even that was not clear in the tables I looked at). In the tables here, such years are not mentioned. Without that information, no reasonable analysis and conclusions can be drawn.

    Please expand your tables to include all the needed information, and more. It could be a great service to the community abroad!

  • louis925

    Can you also show the uncertainty of your data?

  • crit222

    Thank you for publishing the data openly.

    I have a few questions about the data set; is there an email address to which I can send my queries?

  • John F. Braun

    While I think you have a large enough population of HGST and Seagate drives (tens of thousands) to draw some valid conclusions regarding reliability, I’m a bit concerned that you don’t have quite enough data points for the Toshiba (hundreds) and WDC (a thousand or so) to draw a valid conclusion regarding their reliability. Thoughts?

    • Andy Klein

      Fair statement about the number of drives being needed to draw any conclusions. We try to include the number of drives so the reader decide what, if anything, they want to conclude. I’d rather present the information and let someone remove it from their consideration based on their own criteria. As you read through the comments, you’ll notice lots of different criteria…

    • Milk Manson

      How can something be worse than nothing?

  • We can get you a deal on 500+ drives 2TB if your interested? contact us at http://www.jtgsystems.com

    • We mostly buy 4tb+ nowadays, but thank you!

      • What would be your Quantity on them? I might be able to get u a supplier.

        • sadsongs

          Try 1200 at a time, or 5,000, or 10,000, based on something I read here or there.

  • 2005OEFArmy .

    As always, don’t forget to mention that almost 50% of your HGST population are enterprise class drives, while all Seagate drives are desktop drives that were not designed to run in enterprise applications and at enterprise levels of workloads.

    • The majority of our Hitachi/HGST drives came from the Deskstar line, which are not enterprise grade. In fact we try to avoid enterprise drives whenever possible, but there’s a chance we did buy some batches of enterprise drives for testing.

      • 2005OEFArmy .

        10211 of 22731 of HGST drives were Megascale/Ultrastar enterprise class, which is 44.92%. So YES, the majority are desktop drives, however it’s much closer to half. Also, it’s 44.92% more enterprise class drives than Seagate has represented.

      • David Moor

        “In fact we try to avoid enterprise drives whenever possible”

        This wording implies that there is something other than cost involved. Can you elaborate?

        • Most of it is actually cost related, but there is a little bit of drive stability. Enterprise drives tend to shut down when they encounter any error, because they assume that there’s other drives in a RAID so the “give up” on a read more frequently. Consumer drives tend to stick around for a while because they “think” they’re the only drive there. Our environment is cool with failure, so we want those drives to live for as long as possible. Now, the above might not actually be accurate, but it tends to mirror what we found when we tested some enterprise drives. At the end of the day though, it’s cost, they are usually more expensive, and b/c we don’t care if the drives fail, the extra expense is not worth it.

  • John

    This is mis-information propaganda from seagate. I have been using seagate and WD for 10 years. I have watched 3 seagates die and take data with them, I have never seen a WD Fail. Use this mis-information at your peril.

  • Greg Zeng

    https://bioinformare.blogspot.com.au/2016/02/survival-analysis-of-hard-disk-drive.html?showComment=1463657503026#c1761665941709149726

    revealed the “efficiencies” of the brands to me. The Japanese brands perform better than the USA brands. So WD bought HGST, the best performer. HGST is now totally owned in every way now by WD, so will the worst performer now and the best performer move towards the mean?

    https://en.wikipedia.org/wiki/HGST

    Outsiders like myself are wondering if and when South Korea and Chin will enter these charts. Unfortunately these charts do not cover the nations of manufacture of the products, … yet.

    Ownership and brand-origin of the brands seem to show patterns in the above charts. I am guessing that all items are made in factories based in East Asia, including Thailand, Singapore, Vietnam & China? Perhaps the nation of final assembly of the metal units might show interesting patterns?

    In the developed nations like Australia (where we live now), USA, etc have lost most of our factory creativity. Will East Asia be able to better our abilities?

    • 2005OEFArmy .

      Which is exactly why Backblaze should come under scrutiny. There are a few points about this data set. 1) Unfair comparison of Seagate desktop hard disk drives to HGST enterprise class HDDS 2) No real conclusion can be made in regards to Toshiba drives because the sample size is too small, and in fact the same goes for WD in comparison to Seagate and HGST. In general, as the sample size grows the variance decreases, meaning that they could have easily “lucked” into a much better sample of Toshiba drives or WD drives that are truly representative of the population. Even the sample size difference between HGST and Seagate of 36% is enough to question the actual results. 3) Using power on hours as a reliability metric(yes the drives are warrantied in years) and at the very least not scaling them for workload. 4) Even within in the same system, drives can be used differently and not necessarily just workload wise. For example, drives locate closer to the cooling fans are more prone to errors because of vibration, some companies write to the same physical set of tracks over and over which wears out the media in that section, e.t.c. 5) Whether the drives were brand new or have been previously running qualification testing, e.t.c.

      • Jeff

        It looks like they have a fairly equal mix of Hitachi Deskstar vs Hitachi Megascale drives, and it looks like the Deskstar drives have excellent reliability.

        • 2005OEFArmy .

          Megascale drives are enterprise class drives.

          • Jeff

            Right, my point is that they have a fairly equal mix of enterprise and consumer Hitachi/HGST drives, and in this case, the enterprise drives don’t have a clear reliability advantage.

            I’m particularly interested in this because I used to use Hitachi Ultrastar drives almost exclusively and have started using Deskstar or Deskstar NAS drives in the last couple years with no ill effects.

      • Milk Manson

        Unless and until you can point us to better data, the scrutiny is all yours.

  • Bio Toxin

    I’d be interested in knowing what portion of drives are new and what specific drives have been in service longest. Knowing if newer drives have better rates or exactly when older drives begin to fail for better planning.

  • Ross Lazarus

    Thanks for releasing the additional data and for mentioning my KM modelling.

    I reran my scripts as soon as I got the new data and have updated the KM curves and tests – see http://bioinformare.blogspot.com.au/2016/05/survival-analysis-of-hard-disk-drive.html

    Surprisingly few changes which makes me more confident that the method is robust. Also pleased to see that all the code worked fine. I have a python script to scrape the failure times I need and then an R script to do the plots and test the models. Happy to share if anyone cares.

    Ross Lazarus
    Pubs: http://scholar.google.com/citations?hl=en&user=UCUuEM4AAAAJ

    • Andy Klein

      Your welcome, keep up the good work. And thanks for updating your research.

  • LiveJoX

    Go SSD. Much reliable and much faster. End of story.

    • Elliot Clowes

      That would be totally insane. SSD’s are at least 4x the price of HDD’s. And SSD’s really aren’t necessary for the type of mass storage Backblaze needs, as their speed isn’t really needed.

    • Milk Manson

      SSD much not reliable. End of story.

  • Mike S

    Great article, thank you.

  • ender

    Love your work guys, really awesome reading here.

  • FollowTheORI

    Thank you Guys again for the good work and informing the public!

    This is irreplacable now…

    Keep up! :)

  • Kyle

    Whatever happened to those 8TB Seagate SMR drives Backblaze got a while back? I’ll take them off your hands if you’re not using them.

    • Aeoran

      +1.

      I think I’m not alone in being very curious as to how reliable this type of drive is in the Backblaze environment.

      I would suspect however that the unpredictable write times for a modify-write operation could pose a problem for a backup solution like Backblaze, unless you guys archive every version of a user’s files.

    • Andy Klein

      We did test the SMR drives in our test lab recently. They do not work well in our environment. I am currently rounding up the data and observations and we’ll publish a blog post about our experience. That said, there are places where SMR drives work well, such as archiving and Seagate markets their SMR drives for those types of applications.

      • Ryan .

        please get to writing about them, a google search doesn’t turn up anything about backblaze’s adventure into seagate archive drives.

        • Buddy

          “unsubscribe”

  • Stopher Johnson

    Care to walk us through the differences in the model numbers of the HGST 4TB drives? Particularly the one with 0% failure rate over 3000+ drives? That’s pretty astounding. Do those differ much or at all from the Deskstar models they push in retail stores? Is the 0, .5 and 1% failure rates expected with those drives due to the quality and/or model number that HGST aims for or are those more or less random for fluctuations for more or less identical drives?

    • Andy Klein

      The first table is just for Q1 2016. During Q1, some of the drive model had no failures, hence the 0% failure rate for them. The second table is cumulative over the given time period, that is all failures over that time period. When looking online or in retail stores at a drive, the drive could be the same as we have listed, but be in a “kit” which could have a different model/part number. You’d have to do a little research to see which drive is in a given retail kit.

    • 2005OEFArmy .

      The HGST drives you are questing about are the Megascale family of drives which are designed for enterprise applications specifically. If Backblaze would bother purchasing some Seagate enterprise drives for a fair comparison, you would see rather similar results.

      • Apikoros

        I don’t think Seagate paid for this ad though.

      • John Doe

        Do Enterprise = for servers?
        & Deskstar = for desktop?
        What if you leave your desktop on pretty much 24/7? What drive would you recommend then?

        • 2005OEFArmy .

          Workload is more important then if your drive runs 24/7. As the data suggests, if you can afford to pay a 50% premium you should buy HGST, preferably Ultrastar if there isn’t a big difference between them and Deskstar class drives, but in all likely hood, even if you run your machine 24/7, your HDD will be spinning idly for about 90% of that time unless you will be using it for a home server and will be accessing it all the time. Also, the biggest thing I can recommend, if you care about your data, always run a second drive that mirrors the first, because no matter who makes the drives they can all fail.

          • Trevor

            Um, why do i need a mirror, if 1) i’m running a RAID5 NAS and more importantly 2) I’m a Backblaze customer….duh

          • Because your system is not on your RAID 5 NAS? :)

          • Trevor

            right…and i don’t use Backblaze. Why am i reading this Blog then? Screw backups!

          • DrMuggg

            yeah, tough guys don’t need to do backups….

            …we just cry when nobody can see us…

        • Milk Manson

          And what if I turn my servers off at night, should I swap out the enterprise drives for desktop drives? I think I better…

      • Calvin Dodge

        In past blogs Andy has stated that Backblaze has seen no appreciable difference in reliability between consumer and enterprise drives.

  • Phil

    I’d love to see a failures per day graph next time you do the report

  • Rado

    Guys,

    I’l buy this week one year of your service, JUST because to support posts like this to happen. Thank you for this.

  • Analysing carefully the data, it seems the best cost x benefit is HGST’s HMS5C4040BLE640, correct?

  • nalvarez

    Do you run your disks 24/7 or do you spin them down occasionally? In particular for backups it seems like a lot of data would be write-only until the user loses his local data and needs to restore.

    • They’re constantly spinning, because of the restore process, they have to be available whenever a customer might need the data back. Theoretically withe Vaults, we could spin-down cabinets at a time, but we have not had a reason to do so.

      • nalvarez

        I for one wouldn’t mind waiting a minute for disks to spin back up in order to begin a restore process that will take hours anyway :)

        • Andy Klein

          We also do restores for the Backblaze mobile app. Most folks on mobile won’t wait a couple of hours for a restore :)

      • BC Pro Truckers

        Why not spin down the IDLE pods and store the first five minutes of data transfer on an ECC RAM Drive based on lowest cost per GB ECC Ram, mirrored to the pod volume or use a CACHE SSD if you are worried about UPS backup capacity limits? If a pod goes Idle, and is suddenly needed, the RAM drive would cache the most recent 25% of files or blocks of those files changed. By the time the transfer from cache completes, all the disks would be fully spinning and ready for transfer the remaining data from the volume mirror. This would be a much more environmentally friendly approach, and reduces wear and tear.

        • cataria

          wouldn’t this create tons of load/unload cycles?

          and shorten the lifespan of the disks a lot?

  • Shouldn’t you meassure the failure rate vs years in service, instead of failure rate vs natural years? I’d love to see that graph, since the decrease of failure rate vs (natural) year is totally counter-intuitive.

    • Andy Klein

      Failure rates are computer based on drive hours (or drive days) for a
      given drive model. There’s a description of how we compute the failure
      rate most recently in this post: https://www.backblaze.com/blog…. The formula we use is (100*drive-failures)/(drive-hours/24/365)

  • Lars Viklund

    What is your strategy when it comes to keeping the firmware of your disks up to date? We find it’s one of the most important ones in a system to keep fresh and to bully vendors about.

    • We’ve absolutely done firmware upgrades in the past. Mostly for stability or longevity fixes. If they’re compatibility fixes we don’t typically bother with them since our set-up is so different. But yes, if we think a firmware update will help us we’ll run some tests and then update the fleet.

  • Odysseus Ithacan

    The question I have is why Backblaze is using 5400rpm 3.5″ HGST drives. These are relatively slow compared to 7200rpm drives. Has BB chosen these for greater reliability?

    • Andy Klein

      The short answer we don’t need the speed of the 7200rpm drives, so why buy them. We do occasionally buy 7200 rpm drives, when it makes financial sense (i.e. low cost/MB). Also, the 5400 rpm drives typically use less electricity than the 7200rpm drives.

      • Odysseus Ithacan

        I’m sure that when you’re dealing with thousands of drives, the lower power consumption is significant. However, you’re saying that the speed difference (often significant) doesn’t make a difference? Is that because of bandwidth limitations?

        • Andy Klein

          No bandwidth limitations. The workload is dispersed over a large number of Storage Pods/Vaults so any given read or write is not impeded.

          • Falcon89

            Would it be fair to say that the bottleneck is the amount of data your users want to backup? As I understand it, you have chosen to have 2X servers fill up in 2Y days, instead of x servers in y days? You don’t care how fast they load because you always have many servers accepting new data. As I remember it, each new stored object gets split into 20 shards (17 data, 3 parity) and the incoming data is load balanced anyway, so speed to fill a vault or pod is irrelevant. Am I anywhere in the neighborhood?

  • Kevin Samuel Coleman

    How is 266 failures out of 61,523 drives a 1.84% failure rate? That should be a 0.43% failure rate. All of your failure rates are much higher than the actual numbers you show are.

    • Andy Klein

      Failure rates are computer based on drive hours (or drive days) for a given drive model. There’s a description of how we compute the failure rate most recently in this post: https://www.backblaze.com/blog/hard-drive-reliability-q4-2015. The formula we use is (100*drive-failures)/(drive-hours/24/365)

      • Kevin Samuel Coleman

        There’s gotta be a better name than failure rate, because many people reading this will assume that 9% of Seagate drives they purchase for uses they aren’t intended to be used for (consumer drives in enterprise storage applications like yours) will fail, when actually much less % of those will fail.

        • Ben Mitchell

          Kevin, for whatever reason the row percentages are calculated differently and the way you’d expect. 9% of the 4tb seagate model will fail IF 5 “more” drives fail per quarter in 2016. 5 drives failed in 3 months x 4 months / 207 drives in use. That definitely is a drive to avoid.

      • Stefan Sonnenberg-carstens

        The dimension is not percentage: it is 100-drive-failures per days in service. Multiplaying with 100 does not make it one/hundreth. Which has to be expected by the data presented. So, the metric is useless by itself.

        Mangelndes Verständnis für Mathematik zeigt sich in übertrieben genauen Berechnungen (C.F. Gauss)

        • Andy Klein

          The link below explains how we compute failure rates with a bit more detail.
          https://f001.backblaze.com/file/Backblaze_Blog/hard-drive-stats/computing_failure_rates+copy.pdf

          • Stefan Sonnenberg-carstens

            No, it just shows how the dimensions time, number of drives and number of failures are merged in a misterious way.

            I think here is not the right placet to Diskussion :-)

        • 2005OEFArmy .

          I absolutely agree. Not to mention that once a drive has ran for a year the anual failure rate becomes the actual failure rate, not the normalized expected failure rate that the formula is trying to predict. Also, SMART power on hours have little to do with actual drive reliability, while the actual workload is the driving factor for drive failure(or absolute lack there of). Enterprise class HDDs perform better in 70-80% workload applications than 100% idle or 100% workload(I can go into more detail upon request). Also, desktop drives are not really designed to run in enterprise applications due to a number of factors which include inherent mechanical design as well as error recovery and tolerances.

          • tbbtg

            Enterprise drives are not any more reliable than desktop drives, it’s all just marketing BS.
            See this post right here in this blog about enterprise HDD reliability: https://www.backblaze.com/blog/enterprise-drive-reliability/

          • 2005OEFArmy .

            Trust me, enterprise are a lot more reliable that desktop drives, and is a big reason why they have significantly higher MTBF as well as a 5 year warranty vs 1-2 year.

          • Scott

            Do you have some empirical evidence of this beyond what Backblaze has already demonstrated? Just saying “Trust me” doesn’t offer much insight as to why you think they’re more reliable.

          • Peace Love

            Yep. Enterprise offer more warranty but not relative to data safty.

          • Dilldeezy

            My personal experience with small business seems to indicate the same thing, but I have not done any scientific studies. BBs huge amounts of data seems to prove otherwise.

        • ElC

          It’s fine, the units cancel out, don’t waste peoples time.

          Units are:
          /(hr/(hr/day*day/yr)
          — > /yr, fraction failing per year, multiply it by 100 to get a %