Why does a company that keeps more than 25,000 disk drives spinning all the time not know how long they last? Backblaze has been providing reliable and unlimited online backup for over five years. For the past four years, we’ve had enough drives to provide good statistics, but 78% of the drives we buy are living longer than four years. So while 22% of drives fail in their first four years, and we have detailed information about the failure rates of drives in their first four years, we don’t yet know what will happen beyond that. So how long do drives last? Keep reading.
How Drives Are Used At Backblaze
Backblaze uses lots of hard drives for storing data. 45 drives are mounted in each Backblaze
Storage Pod, and the Storage Pods are mounted in racks in our data centers. As new customers sign up, we buy more disk drives, test them, and deploy them. We are up to 75 petabytes of
cloud storage now.
Before being deployed, each Backblaze Storage Pod is tested, including tests on all of the drives in it. Recently, Andy posted about Poor
Stephen, a disk drive that failed this testing. His post describes the process Backblaze uses to set up, load test, and deploy a Storage Pod.
Types Of Hard Drives In The Analysis
Backblaze has standardized on “consumer-grade” hard drives. While hard drive companies say these drives are not designed to work in RAID arrays or the 24×7 workload of a data center environment, Backblaze uses software redundancy to protect data. In a future blog post we will delve into the statistics comparing “consumer” and “enterprise” hard drives.
By far the majority of these hard drives are “raw” or “internal” hard drives. However, because the Thailand Drive Crisis made it nearly impossible to find internal hard drives for sale at reasonable prices, Backblaze started to farm
hard drives. Thus, approximately 6 petabytes of the drives in this analysis were originally “external” hard drives that were “shucked” out of their enclosures.
The chart below shows the age distribution of the drives in the Backblaze data centers. The shape of the chart is mostly a reflection of the growth of the company, and the addition of drives as the customer base grew. Overall, not that many drives fail.
Before diving into the data on failure rates, it’s worth spending a little time clarifying what exactly a failure rate means. At first glance, you might think that a failure rate of 100% is the worst possible. Every drive is failing! That’s not the whole story, though.
Imagine you have a disk drive supplier who provides drives that are 100% reliable for six months, but then all fail at that point. What’s the annual failure rate? If you have to keep 100 drives running at all times, you’ll have to replace the drive in every slot twice a year. That means that you’ll have to replace 200 drives each year, which makes your annual failure rate 200%. So, in theory at least, there is no worst possible failure rate. If every drive failed after one hour of use, the annual failure rate would be 876,000%. Fortunately, the drives that Backblaze gets are more reliable than that.
The Bathtub Curve
Reliability engineers use something called the Bathtub Curve to describe expected failure rates. The idea is that defects come from three factors: (1) factory defects, resulting in “infant mortality”, (2) random failures, and (3) parts that wear out, resulting in failures after much use. The chart below (adapted from Wikimedia Commons) shows how these three factors can be expected to produce a bathtub-shaped failure rate curve.
The theory matches the reality that Backblaze experiences. The chart below shows the failure rate of drives in each quarter of their life. For the first 18 months, the failure rate hovers around 5%, then it drops for a while, and then goes up substantially at about the 3-year mark. We are not seeing that much “infant mortality”, but it does look like 3 years is the point where drives start wearing out.
Calculating Life Expectancy
What’s the life expectancy of a hard disk drive? To answer that question, we first need to decide what we mean by “life expectancy”.
When measuring the life expectancy of people, the usual measure is the average number of years remaining at a given age. So when we say that the life expectancy of newborns in the world in 2010 is 67.2 years, we are saying that if we wait until all of those new people have lived out their lives in 120 or 130 years, the average of their lifespans will be 67.2.
For disk drives, it may be that all of them will wear out before they are 10 years old. Or it may be that some of them last 20 or 30 years. If some of them live a long, long time, it makes it hard to compute the average. Also, a few outliers can throw off the average and make it less useful.
The number that we will be able to compute soon, and the one that is more likely to be useful, is the median lifespan of a new drive. In other words, at what age have half of the drives failed? We are starting to get an idea what the answer will be.
Disk Drive Survival Rates
On the internet, it’s surprisingly hard to get an answer to the question “How long will a hard drive last?” What you’ll find are mostly anecdotal stories, or perhaps references to Google‘s and CMU‘s studies, neither of which really answer the question.
The anecdotes you get don’t give you any useful information:
“Hard drives are mechanical and thus will eventually fail. …
I’ve had drives arrive DOA, some die after a day, and some that
have lasted 10 years. There is just no way to tell how long a drive will
“I don’t know about 5 years. My WD died after 2 years.”
study has some interesting information on failure rates. They found that temperature doesn’t matter as much as you might think, and that the SMART checks of a drive aren’t very good at predicting drive failure.
study found that manufacturer’s MTBF (Mean Time Between Failures) ratings are exaggerated. Drives fail a lot more than the MTBF would indicate.
The chart below shows the percentage of drives at Backblaze that are still alive at different ages:
For the first 1.5 years, drives fail at 5.1% per year.
For the next 1.5 years, drives fail LESS, at about 1.4% per year.
After 3 years though, failures rates skyrocket to 11.8% per year.
Most Drives Are Still Alive
The chart above could be misleading. At a glance, it appears that most of the drives have already died and all are on track to die within the next year. However, if you redraw the chart with the bottom at 0, you can see that nearly 80% of all the drives Backblaze has ever purchased are still operating!
How Long WILL The Hard Drives Last?
What happens to drives when they’re older than 5 years? Neither Google nor the CMU team presented any data on drives older than 5 years, although the CMU paper has a tantalizing comment in its conclusion claiming that failure rates go up after 5 years. No basis for that assertion is provided, though.
At Backblaze, we’ve been up and running for 5 years, and all of the drives we install are new drives, so we also don’t have any data for drives older than that. We are looking forward to finding out what will happen when drives become 5, 6, 7, and 8 years old.
If you extrapolate the line from the previous chart to estimate the point at which half of the drives have died, you get a prediction:
The median lifespan of a drive will be over 6 years.
When Backblaze started, there were some concerns that consumer-grade disk drives wouldn’t hold up in a data center. If this 6-year median lifespan is true, it means that more than half the drives will last six years, and those concerns were unfounded. We intend to continue to update these statistics quarterly. Thus, over the next couple of years, we’ll have hard data on the median lifespan of hard drives. Stay tuned to the blog to find out the answers.
Nov 14: Update
My bad: Due to a transcription error, the percentages in the second paragraph were wrong, and were more pessimistic than necessary. 78% (not 74%) of drives are still alive after four years. The projection of a six-year median lifespan is not affected by this change. Thanks to sharp-eyed Frédéric for catching the error. – Brian