Last month I dug into drive failure rates based on the 25,000+ consumer drives we have and found that consumer drives actually performed quite well. Over 100,000 people read that blog post and one of the most common questions asked was:
“Ok, so the consumer drives don’t fail that often. But aren’t enterprise drives so much more reliable that they would be worth the extra cost?”
Well, I decided to try to find out.
In the Beginning
As many of you know, when Backblaze first started the unlimited online backup service, our founders bootstrapped the company without funding. In this environment one of our first and most critical design decisions was to build our backup software on the premise of data redundancy. That design decision allowed us to use consumer drives instead of enterprise drives in our early Storage Pods as we used the software, not the hardware, to manage redundancy. Given that enterprise drives were often twice the cost of consumer drives, the choice of consumer drives was also a relief for our founders’ thin wallets.
There were warnings back then that using consumer drives would be dangerous with, people saying:
“Consumer drives won’t survive in the hostile environment of the data center.”
“Backblaze Storage Pods allow too much vibration – consumer drives won’t survive.”
“Consumer drives will drop dead in a year. Or two years. Or …”
As we have seen, consumer drives didn’t die in droves, but what about enterprise ones?
In my post last month on disk drive life expectancy, I went over what an annual failure rate means. It’s the average number of failures you can expect when you run one disk drive for a year. The computation is simple:
Annual Failure Rate = (Number of Drives that Failed / Number of Drive-Years)
Drive-years a measure of how many drives have been running for how long. This computation is also simple:
Drive-Years = (Number of Drives x Number of Years)
For example, one drive for one year is one drive-year. Twelve drives for one month is also one drive-year.
Backblaze Storage Pods: Consumer-Class Drives
We have detailed day-by-day data about the drives in the Backblaze Storage Pods since mid-April of 2013. With 25,000 drives ranging in age from brand-new to over 4 years old, that’s enough data to slice the data in different ways and still get accurate failure rates. Next month, I’ll be going into some of those details, but for the comparison with enterprise drives, we’ll just look at the overall failure rates.
We have data that tracks every drive by serial number, which days it was running, and if/when it was replaced because it failed. We have logged:
14719 drive-years on the consumer-grade drives in our Storage Pods.
613 drives that failed and were replaced.
Commercially Available Servers: Enterprise-Class Drives
We store customer data on Backblaze Storage Pods which are purpose-built to store data very densely and cost-efficiently. However, we use commercially available servers for our central servers that store transactional data such as sales records and administrative activities. These servers provide the flexibility and throughput needed for such tasks. These commercially available servers come from Dell and from EMC.
All of these systems were delivered to us with enterprise-class hard drives. These drives were touted as solid long-lasting drives with extended warranties.
The specific systems we have are:
We have also been running one Backblaze Storage Pod full of enterprise drives storing users’ backed-up files as an experiment to see how they do. So far, their failure rate, has been statistically consistent with drives in the commercial storage systems.
In the two years since we started using these enterprise-grade storage systems, they have logged:
368 drive-years on the enterprise-grade drives.
17 drives that failed and were replaced.
Enterprise vs. Consumer Drives
At first glance, it seems the enterprise drives don’t have that many failures. While true, the failure rate of enterprise drives is actually higher than that of the consumer drives!
|Enterprise Drives||Consumer Drives|
|Drive-Years of Service||368||14719|
|Number of Failures||17||613|
|Annual Failure Rate||4.6%||4.2%|
It turns out that the consumer drive failure rate does go up after three years, but all three of the first three years are pretty good. We have no data on enterprise drives older than two years, so we don’t know if they will also have an increase in failure rate. It could be that the vaunted reliability of enterprise drives kicks in after two years, but because we haven’t seen any of that reliability in the first two years, I’m skeptical.
You might object to these numbers because the usage of the drives is different. The enterprise drives are used heavily. The consumer drives are in continual use storing users’ updated files and they are up and running all the time, but the usage is lighter. On the other hand, the enterprise drives we have are coddled in well-ventilated low-vibration enclosures, while the consumer drives are in Backblaze Storage Pods, which do have a fair amount of vibration. In fact, the most recent design change to the pod was to reduce vibration.
Overall, I argue that the enterprise drives we have are treated as well as the consumer drives. And the enterprise drives are failing more.
So, Are Enterprise Drives Worth The Cost?
From a pure reliability perspective, the data we have says the answer is clear: No.
Enterprise drives do have one advantage: longer warranties. That’s a benefit only if the higher price you pay for the longer warranty is less than what you expect to spend on replacing the drive.
This leads to an obvious conclusion: If you’re OK with buying the replacements yourself after the warranty is up, then buy the cheaper consumer drives.