What Can 49,056 Hard Drives Tell Us? Hard Drive Reliability Stats for Q3 2015

By | October 14th, 2015

Q3 2015 Hard Drive Reliability Stats
As of the end of Q3 2015, there were 50,228 drives spinning in the Backblaze datacenter. Subtracting boot drives, drive models with less than 45 drives and drives in testing systems, we are publishing data on 49,056 hard drives spread across 26 different models, varying from 1.0TB to 8.0TB in size.

What’s New for the Q3 2015 Results?

In this edition, we are publishing the data on our 1TB drives for the first time. The data was always available in the data files we publish on our Hard Drive Data web page, but now we’re reporting the data here too. We are also going to include “Average Drive Age” for each model and we’ll summarize the data by manufacturer size as well.

Hard Drive Failure Rates

Let’s start by breaking down the drives by size and comparing them over time:

blog-q3-stats-table
There’s a lot going on in the chart above, here are a few things to help out:

  • The 2013, 2014, and 2015 failure rates are cumulative for the given year. In the case of 2015 that is through Q3 (September).
  • If the failure rate is listed as 0.00% there were drives in use, but none of the drives failed during that period.
  • If the failure rate is blank, there were no drives in use during that period.
  • The “All Periods” failure rates are cumulative for all data (2013-Q3 2015).
  • The “Max # in Service” column is the maximum number of drives ever in service for the given hard drive model.
  • The “Avg Age (Months)” column is the average age of all the hard drives of the given hard drive model. This is based on SMART 9 data.
  • If the “Avg Age (Months)” data is 0.0, the given drive model was not in service during 2015 making the value difficult to compute. (We’ll try to figure out a better way to compute this value by the next report.)
  • The HGST (*) model name – we’ve been asked to use HGST in place of Hitachi and we are honoring that request, but these drives report their model as Hitachi and are listed as such in the data files.
  • The Low Rate and High Rate are the boundaries for the confidence interval for the failure rate listed.

If the chart is too much data all at once, you can download a ZIP file that when unzipped contains a Microsoft Excel file of the data from the chart. Then you can parse the facts and figures at your leisure.

Some Observations Based on This Data

  • In the chart below are the failure stats for all of the drives in the review, broken down by year. We’ve also computed the average failure rate for all periods for all drives at 4.81%.

blog_q3stats_total_avg

  • The Western Digital 1TB drives in use are nearly 6 years old on average. There are several drives with nearly 7 years of service. It wasn’t until 2015 that the failure rate rose above the annual average for all drives. This makes sense given the “bathtub” curve of drive failure where drives over 4 years start to fail at a higher rate. Still the WD 1TB drives have performed well for a long time.
  • Nearly all of the 1TB and 1.5TB drives were installed in Storage Pod 1.0 chassis. Yet, these two sizes have very different failure rates.
  • Nearly all of the 2TB and 3TB drives were installed in 2.0 chassis. Yet, these two drive sizes have very different failure rates.
  • Always consider the number of drives (Max # in Service) when looking at the failure rate. For example, the 1.5TB Seagate Barracuda Green drive has a failure rate of 130.9%, but that is based on only 51 drives. We tested these Seagate drives in one Storage Pod in our environment and they were not a good fit. In general, we’ve found it takes at least 6 Storage Pods (270 drives) worth of drives to get good sense of how a given drive will perform in our environment.
  • 4TB drives, regardless of their manufacturer, are performing well. The 2.10% overall failure rate means that over the course of a year, we have to replace only one drive in a Storage Pod filled with these drives. In other words, on average, a pod comes down for maintenance once a year due to drive failure. The math: 2% is 1 out of 50. There are 45 drives in a pod, so about once a year, one of those 45 drives, on average, will fail. Yes, the math is approximate, but you get the idea.
  • 6TB drives, especially the Seagate drives, are also performing well, on par with the 4TB drives so far. The 6TB drives give us 270TB Storage Pods, giving us 50% more storage at the same overall cost per GB.
  • The 5TB and 8TB drives are performing well, but we only have 45 of each in testing, not enough to feel confident in the numbers yet as can be seen in the confidence interval (low rate/high rate) of these drives.

Drive Failures by Manufacturer

Below is the chart of failure percentages by manufacturer. This is for all the drives in this analysis for the years noted:

hard drive reliability by manufacturer

    Embed this graph on your site

Our Environment

All the hard drives in this review are used in the production systems in our datacenter. The environment is climate controlled and all drives are individually monitored. Each day we pull the available SMART stats reported by each and every drive. These stats are available for download from our Hard Drive Data web page. Those stats form the basis for this blog post. The large data files make for large data sets to work on, but if you give it a try, please let us know if you find anything interesting.

Andy Klein

Andy Klein

Director of Product Marketing at Backblaze
Andy has 20+ years experience in technology marketing. He has shared his expertise in computer security and data backup at the Federal Trade Commission, Rootstech, RSA and over 100 other events. His current passion is to get everyone to back up their data before it's too late.
Category:  Cloud Storage
  • albundy57

    Thank you, you beautiful bastards, for giving us the real down and dirty on actual functioning drives. Im very tired of some manufacturers saying 2,000,000 Hrs before MTF in their literature and their arrays being rebuilt almost every other week due to failure. My own experiences directly relate to your statistical findings! Going to buy some Hitachi drives today for a new audio-visual NAS device.

  • Oliver St.John-Mollusc

    I have a 1TB WD blue drive used only for backups that is goosed after only 26 days and 10 hours of “POH” -power on hours

  • Jus’ Sayin’

    I ran across an impressive second take on Backblaze’s data.

    Kudos and thanks to both Backblaze and the scientist.

    https://bioinformare.blogspot.com/2016/02/survival-analysis-of-hard-disk-drive.html

  • Karmakat

    Ok, does anyone have a recommendation on a hard drive (At least 3TB total) that would hold up well under constant media streaming use? I currently have a 1.5TB WD that is from 2009 that I have used constantly (24/7 streaming of audio and video media over a network) and it hasn’t given me any trouble until a couple days ago and now I have to figure out what to get to replace it. The big thing is that it needs to be fast enough to stream without lagging and be able to handle heavy, heavy constant use. Any accurate info would be helpful!

  • 2005OEFArmy .

    I know it’s bad to bump an old blog, but: 1) 50% of HGST drives were enterprise class drives 2) You can’t compare such drastically different sample sizes of HGST+Seagate drives vs Toshiba+WD, because as I assume you know, that larger sample size decreases the variance of the sample space. Meaning, that Backblaze, could have easily “varied” in to a better than average population of Toshiba drives(there were only 200 some) than Seagate drives(31,000). 3) You should not be running desktop drives in enterprise applications , and the fact that you do so should have your customer base outraged.

  • Panayiotis Black

    Τhe graph does not even say half the truth.
    Wd sold 204 m hdds in 2015,and hgst 76m.

    According to this and your numbers,the failure rate between them is 2.14/1.
    But their production difference is 2.68/1.

    Furthermore,the 8% of Wd broken disks could be up and running for over 7 years.
    So why bothering writing an article that mirrors the truth in absolute numbers,but has no real value when it comes down to comparing reliability?

  • Khan

    Great site – thanks for this amazing amount of info!

  • Chris Butler

    What a phenomenally useful quarterly post. Thank you!

    I am about to replace 8 SATA desktop class drives in 8 High-Rely removable drive carriers that we use with our Tandem DXR dual bay removable storage system. They are used every night in a rotation of supplemental full backups that are taken to a secure offsite facility

    Up until I found this blog, I dreaded the idea of researching hard drive reliability based on “reviews” which usually boils down to either first day performance comparisons, or amazon / newegg / etc “reviews” which are usually only written by people who have a defective product, or are angry with the online retailer’s handling of the purchase.

    Only your unique environment could really give useful data to a hard drive market segment that is by nature usually small-scale in real world situations.

    Thank you so much, this really likely will save us many headaches years down the road.

  • Richard Jackson

    Would be keen to see the date of manufacture specifically, rather than year of sale/purchase.
    I’ve been a WD fan for years, but technically Seagates 2015 results are looking much much better, even than WD.

  • Soldier

    Stick with HGST brand. Not only can these be found usually cheaper than working Seagate hard drives online but they are the most reliable brand out of all the hard-drives. I’ve never had a HGST brand die on me. I just bought a New 3tb 7200rpm HGST for 60 bucks online while used Seagate ones were being sold for higher price! The hard drive brand with the same terabyte size known to die the quickest is being sold for more $$$ online than the most reliable brand!

  • Chris D

    Thanks for the info. This kind of makes me want to buy HGST, even though they all are basically sharing the same information at this point, HGST seems to be utilizing it properly.

  • I think it would be great to add how many of each hard drives you have. I understand that you are caclulating percentages by manufacturer, but
    Maybee you use 10 000 seagate drives and 100 HGST drives, so Seagate would have bigger failure rate than HGST.
    Any information about that? Otherwise these rates don`t seem objective to me.

  • cofinoa

    Thank you for sharing this info.

    how interval confidence has been computed? at which % confindence level?

  • Steve Irons

    Any idea when the 4th quarter result will be up? I got burned really badly on the 3TB Seagates, and have several of them on ice until I can transfer the contents to more dependable drives.

    • Max

      lol savage baby from mad max2

  • Greg Thurtle

    Great information.. it’s very hard to get good figures (or any figures). So this is very welcome.

    You don’t fancy doing the same with Flash based memory cards do you? :)

  • Rotem Rosenberg

    Hello

    I didn’t understand somthing,
    how can there be more than 100% fail?

  • Munchma Koutchey

    I still don’t understand why you data centre experts keep shoving desktop drives into multi-drive enclosures/racks instead of buying the right types of drives for these high vibration deployments. Is the cost difference that great that 1 or 2 failures per pod is worth it? Why buy cheap Seagate desktop drives instead of Surveillance or Enterprise NAS drives that are designed to handle all the vibration?

  • m4dsk

    Andy, could you please tell us whether most disks labeled as 1 in the data set are actual failures or proactive replacements done by admins? Would there be any chance to distinguish between these 2?

  • I’d love to see an “average age of death” item in the per-model data, might be telling.

    • Paul Biedler

      The better way to describe things is through the reliability at a certain time. MTTF (mean time to failure, eg “average age of death” can often be somewhat misleading, especially in failure modes that have a fairly high infant mortality. That being said, the Weibull Distribution gives an “eta” value which is known as the “characteristic life” this is, by definition, the point at which 63.2% of the items in the population are expected to fail. This is roughly analogous to an MTTF. Please be aware, that with the Weibull distribution, you will will often get an eta value that is older than the oldest drive that currently exists, which is to be expected when you still have a high proportion of your drives still running. This needs to be taken with somewhat of a grain of salt, since with this data multiple competing failure modes are all grouped together here, so a steep wear out mode may show up and alter the parameters here significantly. That being said, here are the modelled predicted characteristic lives for the top 11 drives:

      HGST HMS5C4040ALE640 2,556,572
      HGST HMS5C4040BLE640 154,774,402
      Hitachi HDS5C3030ALA630 404,865
      Hitachi HDS722020ALA330 97,079
      Hitachi HDS723030ALA640 116,343
      Seagate ST3000DM001 24,869
      Seagate ST31500541AS 78,825
      Seagate ST4000DM000 351,038
      Seagate ST6000DX000 1,526,269
      Western Digital WDC WD30EFRX 235,649
      Western Digital WDC WD60EFRX 232,405

      • Brian Bruinhard

        Can you explain the numbers next to the hard drive models? Ar these days/hours/weeks?

        • Paul Biedler

          This is in hours.

        • Paul Biedler

          Yes, sorry, the numbers next to the hard drive models are the “characteristic lives”, in hours

  • I had a conversation with a non-IT Sci-Fi fan who was thinking “once we can upload ourselves, we’ll be immortal!” I had to burst that little bubble by asking some simple questions, like had he ever had a computer die on him? a hard drive fail? lost files without a backup? I don’t think anyone who actually works in IT seriously thinks that a digital upload will mean immortality, unless you’ve got a solid automatically self-healing and self-drive-replacing backup system…

  • Oberoth

    I believe they may not be designed for ‘heavy use’ but are you contemplating using ‘Seagate Archive V2 ST8000AS0002 8TB’ drive?

    I would love to see the failure rates on this drive plus if it does fair well then it’s surely the ultimate drive for you, super high storage density and amazing price per GB.

  • MoogleStiltzkin

    any chance backblaze will be using HGST HDN724040ALE640 ? seems like their the most cost effective 4tb on the market, so surprised it didn’t make it onto backblaze rooster :X

    • Preston G

      I have these drives, along with the 32 HDS722020ALA330 drives, dating from 2009 until 2012. I have had one fail (it was dropped before being installed, failed after 4 years).

      Hitachi and sequentially HGST seriously rock. The box that these beasts are all installed in is getting hammered daily with reads and writes. I recommend them for Desktop, External and Enterprise use to everyone.

  • JerkfaceMcGee

    Thanks for the data, interesting to see WD in the lead for most failed drives this quarter.

  • VertexWolf

    Thanks for releasing this really important information. It’s nice to see your efforts apparently have spanked Seagate to get their quality in line with the competition!! Now, bring on the 1 Petabyte SSDs using Infinite V-NAND that never wears out!

  • J. Jourard

    I haven’t read every post but I don’t think it is highlighted enough here that the extremely high Seagate failure stats were connected with the fact that Backblaze had bough a whole bunch of refurb drives and it was those units that showed abysmal failure rates. New unused ones did not do that. In an earlier blog, this wasd plain to see. I’m not seeing that important point highlighted here.

    • Got a link for that blog post?

    • J. Jourard
    • timothyhood

      Well, they didn’t actually buy refurb drives. What they did was receive warranty replacements that they suspect may have been refurbed with prior usage. There’s no way to know that, however. In fact, BBs “refurb” theory, if anything, helps to paint Seagate in a better light. Unless, of course, you need in-warranty replacement of your Seagate drive. Then, you might also expect a shorter life expectancy of the replacement drive as well.

  • disqus_58NLVBV9m3

    Hi!
    Sorry for my english! But i want to ask you! I want a 2 or 3tb hard drive for my pc. For movies, mp3, some torrent and many iso files. I use my pc every day, sometimes all day and night. What should i buy?

    Thank you!
    Tom

  • Nick Fusco

    With the consistency of this information over the past few years, I don’t quite understand how HGST hasn’t picked up some market share.
    Also, I would love to see some larger samples of WD drives, I hope that they become viable enough price wise for you to purchase a bunch.

  • Goddard

    Good stuff..

  • Seagage is killing the HDD market. The customers do not care which HDD maker do its best. They just think [Oh, HDD is so fragile. its failure rate is 14%! ]. Seagate should focus on the reliability design of its products.

  • Matthew F

    Funny as before with their flawed study… they claimed Seagate was the worst… and yet, here they are,. they keep buying Seagate drives…

    • HenkPoley

      For their purpose it’s not worse enough to switch. If you have many copies laying around, the fact that there is an 0.04% chance that a drive dies this day is not substantial (with ~13.5% annual rate). You’ll just recover from any of the other copies.

      And actually numbers are better than the ones I used above.

    • fencepost

      There’s some discussion of drive prices in an earlier post. At that time, the HGST drives were both more reliable and 30+% more expensive, so they can work with the cheaper drives (and get warranty replacements for them unless they’re lingering “shucked” ones) and still have the total cost be lower than using the more expensive higher-reliability drives.

      Backblaze has good redundancy in place and a simple and efficient system for replacing drives. They can tolerate failures that would be a much bigger problem in smaller setups.

  • Frank Bulk

    It appears that the combination of larger drives with lower failure rates result in a significantly lower failure rate per bit!
    At what point is there a tradeoff between cost and failure rate? If the drive $10 more but has a 1% lower failure rate, is that “worth” it?

    • Andy Klein

      The trade-off between failure rate and cost is something we consider here, but there are many variables in play, especially in our environment and how our software operates. We are looking at writing a post to look at this topic to examine these variables and see were that trade-off point lies.

  • John McClane

    I was wondering Andy if you believe these statistic relate also to 2,5” disks somehow? I mean I need to buy 1TB external drive now. Is it well-judged that I should focus mostly for HGST/Hitachi drives?

    • Andy Klein

      I don’t think you can correlate this review to 2.5″ drives. Sorry.

    • timmmay

      Definitely not. 2.5″ drives are entirely different beasts.

  • von Levi

    Have you tried to determine why some drives consistently last longer than others? Is it design, parts, etc.? As a consumer is there anything specific I should be looking for when buying drives?

    • Andy Klein

      We have not found the secret sauce as to why one drive lasts and another doesn’t, that’s one of the reasons we track them, so we can see failure patterns and act accordingly. For a consumer, my advice is simple, back-your-stuff-up, multiple times. Have a local backup a copy or two offsite. Hard drives will fail, have a good backup plan.

      • MightyDrunken

        Well you would say that. ;)

  • Why are you guys using desktop rated drives for cloud storage and backup? For instance you only list Seagate Barracuda drives and not the NAS or Enterprise rated drives which would the drives I would think you’d be using. I would only use NAS rated or above for my home backup solutions so why aren’t you?

    • Andy Klein

      We’ve talked in other posts about why we use the drives we do, but basically it comes down to cost. Our environment and software minimize the risk of individual disk failure, so it makes no sense to spend more money on enterprise drives for example. We do use some NAS drives (WD comes to mind), but again only if it makes economic sense.

  • Kyle

    Whatever happened to those 8TB SMR drives you said you would try? It’s been at least 6 months.

    • Andy Klein

      They ended up in the engineering lab, I don’t have anything to report yet.

  • Wes Kalata

    Have you considered using WD Purple drives? I have them in my personal machine and I haven’t had any issues yet (I’ve had them for about 2 years).

    • Andy Klein

      WD Purple 4TB $156, WD Red 4TB $149, Seagate 4TB $123. The cost difference doesn’t make sense for us given our environment. But for a given individual with 1 or 2 drives or a specialized environment, the added cost might be well worth it.

  • Ben Jolly

    Andy, is there much consideration into disk migrations?

    e.g. at what point is running a pod with 1tb disks no longer economical over them being migrated to 4 or 6tb disks. i.e. some sort of rack space vs Tb density metric

    • Andy Klein

      Good question. We do migrate from lower density drives for economic purposes. As part of that we prioritize based on failure rates, so we are migrating from the 1.5TB drives first, given how well the 1TB drives are doing.

      • Meta

        Uhm, your table for the 1.5tb may need looked at. I haven’t reviewed against older data, but I’m pretty sure 129% and 222% failure rates are impossible in this context.

        • timmmay

          That’s YoY

  • Jason Hall

    http://i.imgur.com/5ZyPUCq.png — My personal systems are not supported, and the Business Option seems to require at least 5 machines, while I could easily subvert this requirement with a non-server guest or possibly manipulating the installer… I’d rather not add another layer to my already complex “home” network. Will I be able to use BackBlaze on a Windows Server install as personal use in the future or can an exception be made?

    • Hi Jason! Unfortunately we don’t support servers at this time. We have a new product coming out soon, Backblaze B2 that you might be able to use! You can get more information here -> https://www.backblaze.com/b2/cloud-storage.html

      • Milk Manson

        Speaking of B2, if you guys really want me to spam my entire contact list trying to sign people up, you’re going to have to let me in first.

        • Heh, noted ;-)

    • Goddard

      You know what I did. I just went to my Dads house and hooked up a USB drive to his router and then I got offsite storage for my server.

      • Milk Manson

        Backing up on Thanksgiving and Christmas really isn’t enough.

        • Goddard

          Clearly you don’t understand. If you hook up an external drive to a router it can now be accessed over the internet. Therefore it is a “cloud” drive.

          • Milk Manson

            My bad.

  • timothyhood

    C’mon, don’t be afraid to call Seagate the crap that they are. Your data reflects on a macro level exactly what I experienced. After seeing high failure rates on multiple Seagate drives, I finally had enough. I switched them all out for Hitachi drives and haven’t experienced a single failure since. The remaining Seagate drives were used for migrating data and other occasional use, wi only one drive left alive. I would never buy a Seagate again, and I would never use one even if given it for free unless I was 100% confident in my backup solution (which certainly would not involve Seagate drives).

    • phuzz

      The data shows that *some models* of Seagate drives are rubbish, but it also shows that overall there wasn’t much between any of the manufacturers.
      Of course, there’s no easy way to tell that a particular model of drive won’t turn out to be a dog until it’s been in service for at least six months, so the best advice is, buy the drives with the best warranty/replacement program, and always backup your data.

      • Craig Herring

        Completely agree with this response. Been in the industry for 25 years and have seen all sorts of issues. Interesting study shows that IT people ARE the reason for data disasters 43% of the time. All hardware will fail at some point, the industry designs it to fail or else there’s no industry. Microsoft’s latest OS is always the “best”. These are interesting studies and have some value especially evaluating scalability and FUD de-validation. If you backup your data you will never loose it because of a dead drive regardless of brand.

        • Xebozone

          What about Windows ME, Visa, 8.0… not the best :)
          Microsoft has a trend that I realized of Good/Bad/Good OSes…
          But now that W10 is the ‘last one’… we will see how things change.

          • vorpaladin

            I believe his point is that MS always claims the newest OS is the best, when of course all of them are terrible. Although XP and Win7 get honorable mentions.

        • timothyhood

          Sure, back up your data to prevent loss, but what about the cost of having to restore data and replace dead drives? Isn’t it best to try to get the most reliable storage in the first place? And if one brand seems to have double the lemon rate, wouldn’t it make sense to avoid that brand and not take the risk?

      • calmdownbro

        Had tons of drives in my desktops and laptops, servers, literally all the form factors. One thing was consistent. Seagates die. In desktop, in laptop, and in my servers. I mean yes, it’s just some models, but I was damn unlucky to pick all the bad ones.

        Since then I use Toshiba, WD, and I never had a single drive failure. None, zero.
        I only threw out drives that were too old/too small, but they worked perfectly fine until the last minute.

        • vorpaladin

          Every WD drive I’ve owned over the past 10 years has died within a year of purchase with light usage. No more WD for me! I bought 4 of them before I learned my lesson. :(
          And yeah, Seagates are terrible too in general. I see lots of failed Seagate drives at work.

    • Nick

      What strikes me as hilarious is your switch to HGST, as they gave us what was quite likely the least reliable hard drive of all time, the IBM Deskstar 75GXP, aka Deathstars.

      • timothyhood

        Probably because that was 15 years ago, which is 10 generations in the computer world. Kind of like blaming a family’s problems on their great-great-great-great-great-great-great grandfather. Also, that was one model, whereas Seagate has had reliability issues with many, many models over a consistent and longer time period. No manufacturer is perfect, but you can at least go with the odds.

      • Soldier

        lol, holding on to a grudge for 15 years. The point is that they make reliable hard-drives now and HGST brands are offered at a lower price sometimes if you look for deals online. Not only more reliable but cheaper in some cases.

    • galan

      In the past 15 years I’ve owned dozens of Seagate and Western Digital drives. I’ve had numerous failures from both. I don’t love one brand more than the other. I recently had a 2tb WD Green drive die after about 5 years, but I still have some Seagate 250gig drives that I’ve had since 2004-2006 that still work perfectly well. The 7200.11 series drives from Seagate were awful, and they sullied Seagate’s reputation, but their other drives haven’t had significant problems.

  • Mark Swope

    Thank you for sharing this data! Very thought-provoking…

  • Paul van den Bergen

    I’d be interested in seeing stats on your lifetime and decommissioning rate for pods – as opposed to individual drives.

  • Joshua Kugler

    Can you explain the failure rates over 100%? The Seagate Barracuda Green drives.

    • It’s an annualized failure rate, so if a lot of the units fail in less than a year, it’ll be a higher failure rate.

    • Jason Hall

      drive fails in under a year, is replaced, that drive fails under a year too, %200 failure rate…

  • It’s nice to see statistics that mean ABSOLUTELY NOTHING.

    Seriously, any comparison is apples and oranges. You use WD Reds for the WD almost exclusively, but you use Seagate LP (aka green) and desktop drives for all the seagate drives. Different work loads, different behavior.

    • Definitely! Plus they’re all going in to storage chassis. Still fun though, folks love looking at these, plus we have a good time pouring over the data ourselves.

      • Only have 76 drives spinning here, but my failure rates are almost an exact match to Backblaze’s! Our 2TB Seagate (HP branded SAS) even went as high as 14% one year :/

    • David

      I believe these stats are absolutely useful as long as you realize what you are comparing.

      Backblaze is the only vendor sharing this data with anyone. Google/Facebook/Yahoo all have their own data and keep it to themselves for the most part. Yes, the builds of the WD Reds and Seagate LP are different and that might account for some of the failure rate differences.

      The main difference and possible error in extrapolating “global” conclusions is the workload. Most consumers aren’t running pod data centers with 24/7 spinning disks.

      • ANd statistics on 45 drives is statistically insignificant.

        And it’s entirely a marketing/publicity stunt (repeatedly), because they’re the only ones doing it still. And why don’t Seagate and WDC release info? Because they don’t want people to draw false conclusions… which is what happens every time BackBlaze releases one of these reports.

        • David

          But certainly statistics on 4500 drives is significant. I can’t speak of Seagate’s and WDC’s reasons for not releasing specific information. But they do release some information such as MTBF rates which are essentially garbage in the real world as documented in several different studies.

          Withholding information to “prevent false conclusions” is not a very good one. I’d rather have the information that have someone decide I’m too stupid to understand it within a context.

          As for marketing… it is great marketing and gets the word out about their services which isn’t marketing disk drives.

        • Milk Manson

          Because they don’t want people to draw false conclusions? No, it’s because they don’t want people drawing ANY conclusions. Screw that.

          • Because people are stupid.
            I mean, look at all the people that think that backblaze’s stats are ANYTHING BUT ANECDOTAL.

          • Milk Manson

            If you’re so freaking smart why don’t you tell us where to get better numbers than right here. Either that or shut up, one of the two.

          • Milk, thank you for making my point.

            There aren’t better numbers published. But just because BackBlaze is the only one with published numbers DOESN’T MAKE THEM GOOD.

            I’m not sure I can explain it simpler than that. Aside from “having only one option doesn’t make that a good option”.

          • Milk Manson

            Doesn’t make them bad either, professor.

          • Brett A.

            There aren’t many sources for this sort of information. 4000+ drives tested from a single environment is anecdotal, but it still gives a broader scope than “Dude, I just bought 2 Maxtor Drives and they’re both failing, I’m never buying Maxtor again!!”… That is primarily what we as consumers had to go on, personal experience and extremely limited anecdotal evidence. This report gives a much wider view of actual failure rates.

            Even those of us who had access to people within the tech industry that used large amounts of drives, they typically had their own biases. I spoke with a couple of people who worked in somewhat large data-centers that avoided drives by one manufacturer or another, or they only bought drives from one specific manufacturer. That isn’t helpful at all. First, their bias could be dead wrong or unfounded… Second, as this report shows, different models typically have more variance with failure rates than different manufacturers do.

            realistically, any data we get is either marketing BS from the manufacturers, or “Anecdotal” from companies like this, that are kind enough to release their data. Our personal experiences will always play a role in our buying decisions, but if there’s one thing these reports have taught me, it’s to not avoid a manufacturer because of bad experiences, avoid that model instead.

            I praise Backblaze for two things.
            ***First, they release this data, their pod builds, and other stuff that other companies usually keep internal. Despite criticism from some individuals and even some online journalists (which doesn’t make sense to me), they continue to do this. Probably because more people find it interesting or helpful than find it harmful or biased (again, I don’t understand, see #2)
            ***Second, This is unbiased. “How is it unbiased when they have such horrible numbers for *Brand-X*? ” … Well, they are still buying brand-x, despite failure rates being slightly higher overall. That proves a lack of bias. They buy what they can get their hands on. A lot of drives, as cheap as possible. Brand be damned. That is, of course, the logical way to operate, considering failure rates are NOT consistent across all models with any given brand.

    • Man, there always has to be that guy.

    • Milk Manson

      Your second paragraph explains why your first paragraph is wrong, in case you’re wondering.

      If they had left out the model numbers then you might have a point.

      • If they included sources, usage scenarios, etc. Then you would be correct.

        • Milk Manson

          Sources like vendors? And what don’t we know about their usage?

          • Yes. But I mean like Costco, Amazon, Newegg, Tiger Direct, etc.
            Backblaze has a history of sourcing the cheapest drives possible, from places like these… and they’re generally external drives that they’ve shelled/shucked.

            And usage: load average, peak (both ends), average time at each. Etc. You know, basic information that shows how used the drives are.
            Because a much more worked drive is going to fail faster than a drive that’s idling for most of it’s life. Unless it’s a WD Green and then you’re just screwed either way.

          • Milk Manson

            Who cares whether the drive is from Costco or Amazon? What difference does it make?

            BB has disclosed when they’ve shucked drives.

            I have a history of sourcing the cheapest drives possible as well. Why? Because the more expensive drives are more expensive.

            Who cares how the drives are used as long as they’re all used the same (and we know the make/model). Run them under water, bash them with a hammer, drag them behind your car to work every day– again, who cares as we know the make/model and they submerge, bash, and drag them all equally.

          • Because drives from costco are solely external drives.
            And it’s very likely that these drives have “tweaked” firmware that handles a bunch of different aspects of the drives mechanics differently (external is a very different use case than internal).

            Without breaking down the numbers for the drive sources (or at least which where external drives), it biases the stats.

            And who cares? Anyone that likes ACCURATE AND MEANINGFUL STATISTICS. If the drive was meant to operate underwater, or take hammer blows, it would make a huge difference which were ran underwater or assaulted with a hammer.

            I’m sorry that you appear to be unable to grasp this basic concept.

          • Milk Manson

            “And it’s very likely that these drives have “tweaked” firmware that handles a bunch of different aspects of the drives mechanics differently (external is a very different use case than internal).”

            “Tweaked” firmware! Well why didn’t you say so? Thanks for this, Mr. ACCURATE AND MEANINGFUL STATISTICS guy.

            So what you’re saying is those miserable POS 3tb Seagate drives might actually be first instead of worst if BB had just left them in their shell and turned them on and off a lot more, carried them around in backpacks, unplugged them without ejecting half the time, and maybe left them out in the car a couple nights a week. Cause that’s what portable drives are MADE TO DO, right? Because they have “tweaked” firmware, right?

            Sorry, the only thing tweaking here is you.

          • Now-now, lets be civil! One thing I can say is that there was almost no difference as far as we could find when we took the 3TB Seagates that we “shucked” and compared them to the internal 3TB Seagates that we bought from the distributors. They more or less failed in similar fashions.

          • cnlson

            @drashna:disqus I hate to say this as I have currently 2 remaining 3tb seagates but what failed repeatedly in mine (2 failures in 3 yrs on one external drive installation) was reallocated sectors. I believe there is another post which details all of their seagate drives which shows 2000 internal (roughly) and 2000 external 3tb and the internals failed at a higher rate than the shucked externals. In that post, they say the same thing, high rates of reallocated sectors until there are no spare sectors to reallocate anymore. In my case, luckily I have a program that checks the smart values daily – hard drive sentinel – and it will either email me or alarm so i know to check, backup and replace. During the same period that the 3 tb failed twice I also had a 2 tb WD that was a year or 2 older fail and it caught that one also. I still buy seagate, I have a 5tb and 8tb as well as 2 of the 3tb remaining (they were in minimal use situations. less than 100hrs on either) and a 2tb and 1 tb portables. But, I’m not going to say or accept that firmware is going to make sectors on the hard drive fail somehow. I can see it failing the motor if it stopped and started more but not the sectors.

          • jp

            to get the most longevity on external devices the firmware makes a lot of tradeoffs in performance and certain types of reliability to make them resilient to being moved or dropped while on or unplugged while writing. some of these changes cause certain mechanicals of the drive to do more work thus wearing them out faster than they would in an internal drive that doesn’t have to be optimized for that to nearly the same extent

          • Milk Manson

            So they make externals more durable at the cost of performance, is that what you’re telling me?

          • jp

            And they make them more resistant to certain failure modes at the cost of making them less resistant to others. A good example is head parking. Less likely to be damaged by motion if the head is parked but excessive parking will make the head fail quicker under continuous use

          • Milk Manson

            Resistance to certain failure modes, more durable. Six of one, half dozen of another.

            Head parking might be a good example of how the two types of drives are different, but it’s not a good example if you’re trying to show these stats are slanted or unfair towards shucked drives. The BB drives might be always powered on, but that is not the same thing as continuously used. BB drives basically fill up then sit around smoking cigarettes for the rest of their lives waiting for something to happen.

          • jp

            they are not the same. durability would be their overall resistance to failure. shuffling around which failures they are more vulnerable to doesn’t necessarily do that. they focus their engineering to prevent in the designed use case the significantly early failure of any one of the components relative to the others. for different use cases this is indeed a different balance. so while a drive designed for one environment might last 10 years average their but only 3 in another while one designed for that environment might make 6 years their but only two in the other drives designed environment. doesn’t necessarily mean it is less durable just has different use cases

          • timothyhood

            These are great theories. Too bad there is no real information to back them up. Also, considering that manufacturers don’t know how a drive is going to be used ultimately, how can this be implemented in the real world? And, why bother with all of this extra engineering cost if it gains nothing in the end? Let’s put an end to this “magic firmware” story…

    • Russ Wright

      Ask for a refund. Backblaze doesn’t have to provide the stats at all. You pay (assuming you are a customer) for backup services

      • It would be better if they didn’t post the “statistics” at all.

        They’re meaningless and people STILL use them to back up their cognitive biases.

        And to be blunt, I’d never trust my data with a company that uses crap (consumer) quality drives. Period.
        They don’t care about reliability and integrity, so why the HELL would I trust my data with them?!

        • Thisis Myname

          Hilarious. If you want to get fleeced buying “enterprise” drives, that’s all on you.

          • That you think you get fleeced for buying enterprise drives tells me pretty much everything I need to know: That you don’t know anything about storage.

          • Thisis Myname

            Thanks for the laughs.

  • andrewc2

    Wow talk about timing, I was just reading the post from last year and thought man I hope they do another of these, and boom there it is. Thanks!

  • Turguy Goker

    Hi Andy

    You guys
    rock. This is great data and I am sure once we have a chance to go over your
    actual data base, the details might be very intriguing in deed. A quick
    question. Are you observing higher Latent Sector Errors with higher capacity
    drives especially the 8TB ones compared to earlier lower capacity ones and is
    there a difference in the latent sector error numbers between the manufacturers
    and models?

    Regards

    TG

    • Andy Klein

      I’m not sure which SMART stat you are referring to by Latent Sector Errors? In our environment, the larger drives perform similar to the “smaller” ones, but if there is particular SMART stat you are interested in, the data files contain all the available SMART stats for all the drives we track.

      • Turguy Goker

        Hi Andy
        If possible, it might be useful to see the following SMART data collected with time stamp so one can analyze if the errors are random or collated or burst.
        Reallocated Sector Count

        SATA Downshift Error Count or Runtime Bad Block

        Reported Uncorrectable Errors

        Uncorrectable Sector Count

        • Turguy Goker

          Assume that we have 2TB and 6TB drives in a RAID6 using 8/2
          configuration. Assume that these are SATA drives with unrecoverable error spec
          of 1 error event in 10^15 bits transferred. The system is undergoing into a
          rebuild due to drive failure and encounters another one so it’s out of RAID6
          protection parities. Now any error will result in a data loss. At this
          situation what is the probability of an unrecoverable error event as a
          percentage based on 4K sector and SATA spec of 10^15 bits for both drive
          capacities?. Our analysis using random error event assumption is 25% for 6TB
          and 9% for 2TB which indicates that as capacities grow the unrecoverable error
          probability will dominate. However these are mathematically computed numbers
          and it would be intriguing to see if we can actually base it from experimental
          data. It’s a long shot but I wonder if the SMART data from ~49K drives can
          actually indicate anything regarding the complexity of this critical drive
          spec. Given the size of your system we need to be able read at least 100X the
          1E11 number of sectors of each 4K to claim that the sector error rate is 3×10^10.
          Note, I translated the unrecoverable
          error event from 1 error event in 10^15 bits to 1 error event in 3×10^10
          sectors read where each sector is 4K size.

          • Andy Klein

            The SMART data is collected once a day at the same time each day. The data as collected is good for trends, but lacks some information you might need. For example, we don’t provide in the data files the the RAID array a given hard drive is/was assigned to. So the type of analysis you outlined might not be possible with the data as it is recorded right now.

          • Turguy Goker

            I understand the potential issues; however if you have the capability collecting SMART data from the 47K+ some drives, the data I am suggesting might be useful to analyze error rate performance of these drives as a function of aging. Again thank you guys for this effort in gemneral which I find it extremely useful.

  • frogstein

    Would nice to see a graph of failure rate vs. age for each model, rather than just year of failure. Percent failures per year doesn’t tell me how old each drive was when it failed.

    • Andy Klein

      Good idea, although a lot of work. We’ll see what we can do for our end of year summary.

      • Paul Biedler

        I’m a reliability engineer by trade, and ideally, what you want to do is express the data as a reliability or survival number with respect to time. The appropriate way to do this is through a Weibull distribution.

        I’ve taken the data up through Q2 (haven’t done the Q3 data) and processed it to put all the failures and suspensions together for the individual hard drive models and then created a Weibull distribution for 11 of the most used hard drives listed above. This provides a probabilistic basis for assessing future failures of your different hard types. I’ve posted a picture of the resulting models plotted of Reliability (1 = 100% of the units will have survived) vs Time (if the link below works). For instance, for HDS723030ALA640, at 25000 hrs, we would expect 97.8% to have survived (2.2% failure rate).

        As commented elsewhere, this is a much better way of expressing the expected reliability as a function of hours of operation, rather than annualized failure rate, as this will take into consideration the “infant mortality” effects, and “wear out” effects of the data, and actually predict them.

        Interestingly, the data seems to group itself – that is the models for the Western Digital of the WD**EFRX type seem to have a similar Weibull “Beta” parameter – meaning they have a dominant failure mode of a similar type. Also interesting is that subsequent models (HGST ******ALA**** vs BLE) seem to have better reliability than previous ones, which is good to see. The interesting one will be the 4 TB Seagates – my model predicts .6% of the population will fail in the next 3 months or so. This would equate to roughly 120 failures.

        There was a comment earlier regarding tradeoff points – one thing that probably makes it “less worth it” is that there probably isn’t a significant cost differential for an “unscheduled failure” vs a “scheduled removal”, unlike say, the airline industry, where such a failure can cost an aircraft.

        https://drive.google.com/file/d/0BxBp8Vkho2pIejNTM2JWU0hpdEk/view?usp=sharing

        • Adrian Kelly

          Thank you for doing this work! These kind of survival rate vs time graphs are exactly what Backblaze should be publishing, or at the very least failure rates at different time intervals. They have a huge amount of data at their disposal and it could be so much more useful.

        • David Fickes

          I’m wondering if it should be a Weibull distribution at all. I took all of the failures together for 2015 and then ran an input analysis which suggested that a Beta function was a better fit. I’ve thought of going through each of the drive models and running the same input analysis for each drive.

          Any suggestions?

          • Paul Biedler

            Grouping all the distributions together would be very unlikely to get a good distribution fit, because the underlying characteristics for each of the drives would be different. In addition, the various drive brands themselves are not of any particular distribution, so the failures of different brands are going to weight differently, which is going to skew things. No, you’d have to go through each of the drive models, which is what I’ve done here, although I need to update through Q4.

          • David Talaga

            The Weibull distribution is not at all consistent with the empirical distribution of survival. https://www.backblaze.com/blog/wp-content/uploads/2013/11/blog-drivestats-3-lifecycles.jpg

            The Weibull distribution assumes a unimodal distribution of failure rates. The data suggests a trimodal distribution of failure rates. Moreover it also suggests that there are different classes of failure with different rate behaviors. That is not consistent with the Weibull picture.

            So, I disagree. The Weibull distribution is not at all appropriate to model this data.

            I would probably model the failure using a set of coupled kinetic equations that account for the 3 modes of failure. I think it would be difficult to capture the phenomenon with fewer than 4 parameters. One parameter would capture the fraction of units subject to infant mortality. Another parameter would capture the baseline rate of failure. Another parameter would capture the accumulation of wear resulting in wear-out failures. You’d probably one or two more parameters to handle the relative time scales and proportions.

        • calmdownbro

          Well, the data is available for free of charge. Grab it, do it, share it.

        • Dave Grodzki

          Small world Paul… Stumbled upon this article looking into some reliability for a drive I use in my NAS, and saw your comment. Your plot is very useful.

    • also, vs IOs or data transfer measure of some kind

  • Tracy Valleau

    The subject is near and dear, having been using consumer hard drives since the day they came out. (38 years in the industry now. I really need to get a life!) The results mirror my own experiences. (I have nearly 50 hard drives right now… who knows how many over the years… ) But really, this is just a thank you for continuing with these reports. I appreciate it.

    • Our pleasure Tracy! We’re glad you like the posts! Thanks for reading :)