Regular Hard Drive Stats readers will recall that our blog post about Q3 2019 explained that we planned to take a closer look at some drive failures we were seeing at the time and report back when we knew more. Well, we’ve been monitoring the situation since then and wanted to update you on where things stand. Despite the fact that Hard Drive Stats for 2019 are just around the corner, we decided to share this information with you as soon as we could, rather than waiting for the next post. In summary, this year (and going into the next year) we expect to see higher failure rates in some of our hard drives and we will be migrating some drives to newer models. Below, we’ll discuss what’s going on, what we’re doing about it, and why customers shouldn’t worry.
So What’s Up?
In a recent blog post, we interviewed our Director of Supply Chain, Ariel Ellis, about how we purchase and qualify hard drives to be deployed in our data centers. The TL/DR is that our qualification process is robust. Nevertheless, for all providers of scale in the cloud storage industry, trends that are hard to project during testing can emerge over time after drives are used in production batches of dozens of petabytes, or more, at a time.
What we’re seeing in our fleet right now is a higher-than-typical failure rate among some of our 12TB Seagate drives. It’s customary for hard drive manufacturers like Seagate, when working with data centers and cloud service providers, to ensure successful deployment of large-scale drive fleets, and as such we’re working closely with them to analyze the drives and their performance. This analysis usually includes things like testing new drive platforms in real workload environments, providing telemetry tools to predict failures, performing ongoing custom adjustments, and employing firmware development and replacement units (RMAs). Customer data durability is paramount for both Backblaze and Seagate, so as we analyze root causes and implications we’re also working together on a migration effort to replace these particular drives in our data centers. In the short term, failure rates for a subset of our drives may increase, but we have processes in place to adjust for that fluctuation.
Running a cloud business is complex, so it’s very helpful to have a partner like Seagate who can help us to react quickly and bring their expertise in drive deployment to bear in aiding our migration efforts. It’s worth noting that situations like this are not uncommon in our industry and often go unnoticed by the end-users of the services, as most cloud providers do not inform customers or the public when they experience issues like what we’re describing. Backblaze, on the other hand, is a bit more open than most companies in the industry.
We’re in a unique position because of the Hard Drive Stats that we publish, which is why we felt it was important to let folks know about the upcoming changes ahead of time. At the end of the day, we think this openness is helpful for everyone, especially our customers.
In the near term, we expect to see moderately increased failure rates for this specific subset of 12TB drives, but as we complete the drive migration, we project our fleet’s failure rates will restore to historical norms. Meanwhile, it will be business as usual. We’ll continue to provide the most reliable, affordable, and easy-to-use cloud storage and computer backup available, and we’ll continue to provide our Hard Drive Stats for you every quarter.