As of March 31, 2020, Backblaze had 132,339 spinning hard drives in our cloud storage ecosystem spread across four data centers. Of that number, there were 2,380 boot drives and 129,959 data drives. This review looks at the Q1 2020 and lifetime hard drive failure rates of the data drive models currently in operation in our data centers and provides a handful of insights and observations along the way. In addition, near the end of the post, we review a few 2019 predictions we posed a year ago. As always, we look forward to your comments.
Hard Drive Failure Stats for Q1 2020
At the end of Q1 2020, Backblaze was using 129,959 hard drives to store customer data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 60 drives (see why below). This leaves us with 129,764 hard drives. The table below covers what happened in Q1 2020.
Notes and Observations
The Annualized Failure Rate (AFR) for Q1 2020 was 1.07%. That is the lowest AFR for any quarter since we started keeping track in 2013. In addition, the Q1 2020 AFR is significantly lower than the Q1 2019 AFR which was 1.56%.
During this quarter 4 (four) drive models, from 3 (three) manufacturers, had 0 (zero) drive failures. None of the Toshiba 4TB and Seagate 16TB drives failed in Q1, but both drives had less than 10,000 drive days during the quarter. As a consequence, the AFR can range widely from a small change in drive failures. For example, if just one Seagate 16TB drive had failed, the AFR would be 7.25% for the quarter. Similarly, the Toshiba 4TB drive AFR would be 4.05% with just one failure in the quarter.
On the contrary, both of the HGST drives with 0 (zero) failures in the quarter have a reasonable number of drive days, so the AFR is less volatile. If the 8TB model had 1 (one) failure in the quarter, the AFR would only be 0.40% and the 12TB model would have an AFR of just 0.26% with 1 (one) failure for the quarter. In both cases, the 0% AFR for the quarter is impressive.
There were 195 drives (129,959 minus 129,764) that were not included in the list above because they were used as testing drives or we did not have at least 60 drives of a given model. For example, we have: 20 Toshiba 16TB drives (model: MG08ACA16TA), 20 HGST 10TB drives (model: HUH721010ALE600), and 20 Toshiba 8TB drives (model: HDWF180). When we report quarterly, yearly, or lifetime drive statistics, those models with less than 60 drives are not included in the calculations or graphs. We use 60 drives as a minimum as there are 60 drives in all newly deployed Storage Pods.
That said, all the data from all of the drive models, including boot drives, is included in the files which can be accessed and downloaded on our Hard Drive Test Data webpage.
Computing the Annualized Failure Rate
Throughout our reports we use the term Annualized Failure Rate (AFR). The word “annualized” here means that regardless of the period of observation (month, quarter, etc.) the failure rate will be transformed into being an annual measurement. For a given group of drives (i.e. model, manufacturer, etc.) we compute the AFR for a period of observation as follows:
AFR = (Drive Failures / (Drive Days / 366) * 100
- Drive Failures is the number of drives that failed during the period of observation.
- Drive Days is the number of days all of the drives being observed were operational during the period of observation.
- There are 366 days in 2020, obviously in non-leap years we use 365.
Example: Compute the AFR for the Drive Model BB007 for the last six months given;
- There were 28 drive failures during the period of observation (six months).
- There were 6,000 hard drives at the end of the period of observation.
- The total number of days all of the drives of drive model BB007 were in operation during the period of observation (6 months) totaled 878,400 days.
AFR = (28 / (878,400 / 366)) * 100 = (28 / 2,400) * 100 = 1.17%
For the six month period, drive model BB007 had an annualized failure rate of 1.17%.
But What About Drive Count?
Some of you may be wondering where “drive count” fits into this formula? It doesn’t, and that bothers some folks. After all, wouldn’t it be easier to calculate the AFR as:
AFR = (Drive Failures / Drive Count) * (366 / days in period of observation) * 100
Let’s go back to our example in the previous paragraph. There were 6,000 hard drives in operation at the end of the period of observation; doing the math:
AFR = (28 / 6000) * (366 / 183) * 100 = (0.00467) * (2) * 100 = 0.93%
Using the drive count method, model BB007 had a failure rate of 0.93%. The reason for the difference is that Backblaze is constantly adding and subtracting drives. New Backblaze Vaults come online every month; new features like S3 compatibility rapidly increase demand; migration replaces old, low capacity drives with new, higher capacity drives; and sometimes there are cloned and temp drives in the mix. The environment is very dynamic. The drive count on any given day over the period of observation will vary. When using the drive count method, the failure rate is based on the day the drives were counted. In this case, the last day of the period of observation. Using the drive days method, the failure rate is based on the entire period of observation.
In our example, the following table shows the drive count as we added drives over the six month period of observation:
When you total up the number of drive days, you get 878,400, but the drive count at the end of the period of observation is 6,000. The drive days formula responds to the change in the number of drives over the period of observation, while the drive count formula responds only to the count at the end.
The failure rate of 0.93% from the drive count formula is significantly lower, which is nice if you are a drive manufacturer, but not correct for how drives are actually integrated and used in our environment. That’s why Backblaze chooses to use the drive days method as it better fits the reality of how our business operates.
Predictions from Q1 2019
In the Q1 2019 Hard Drive Stats review we made a few hard drive-related predictions of things that would happen by the end of 2019. Let’s see how we did.
Prediction: Backblaze will continue to migrate out 4TB drives and will have fewer than 15,000 by the end of 2019: we currently have about 35,000.
- Reality: 4TB drive count as of December 31, 2019: 34,908.
- Review: We were too busy adding drives to migrate any.
Prediction: We will have installed at least twenty 20TB drives for testing purposes.
- Reality: We have zero 20TB drives.
- Review: We have not been offered any 20TB drives to test or otherwise.
- Reality: We announced one exabyte in March of 2020, just past the end of 2019.
- Review: To quote Maxwell Smart, “Missed it by that much.”
Prediction: We will have installed, for testing purposes, at least 1 HAMR based drive from Seagate and/or 1 MAMR drive from Western Digital.
- Reality: Not a sniff of HAMR or MAMR drives.
- Review: Hopefully by the end of 2020.
In summary, I think I’ll go back to my hard drive statistics and leave the prognosticating to soothsayers and divining rods.
Lifetime Hard Drive Stats
The table below shows the lifetime failure rates for the hard drive models we had in service as of March 31, 2020. The reporting period is from April 2013 through December 31, 2019. All of the drives listed were installed during this timeframe.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data webpage. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and 3) You do not sell this data to anyone—it is free.
If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.
Good luck and let us know if you find anything interesting.