Backblaze Reed-Solomon

An open source Java library for erasure coding
Backblaze provides our unlimited online backup service to individuals, organizations, and businesses in over 140 countries. Key to operating this service is our ability to cost effectively store data that can be recovered quickly, accurately, and efficiently.

For the past seven years we’ve used software RAID technology in our Backblaze Storage Pods to provide the file redundancy and reliability needed. When we designed Backblaze Vaults we took the opportunity to rethink our data storage and recovery strategies, and Backblaze Reed-Solomon erasure coding was born.

Putting Erasure Coding to Work
An erasure code takes a message, such as a data file, and makes a longer message in a way that the original can be reconstructed even though parts of the longer message are lost.  Reed-Solomon is an erasure code with exactly the properties we needed for file storage and reliable recovery. It is simple and straightforward to implement while being a reliable, well-proven technique that ensures that an entire data element can be recovered even when part or parts of the original stored data element are lost or unavailable.

The practical application for Backblaze is that in a cloud-scale datacenter, you have to assume that hard drives containing terabytes of data will die on a regular basis. The Backblaze Vault Architecture, utilizing our Reed-Solomon erasure coding implementation, is durable by design so you can trust that your data is safe.

Open Source
We are releasing Backblaze Reed-Solomon as Open Source. The code is licensed with the MIT License, which means that you can use it in your own projects, for free.  You can even use it in commercial projects.  

The source code is packaged in a ZIP file containing the files listed below.

Download Source Code
( – 14K ZIP file, 94K on disk)
You can also download the ZIP file from Backblaze on GitHub.

More Information
Check out our blog post on Backblaze Reed-Solomon to learn how Reed Solomon works. We included an example on how data can be divided into a coding matrix then be completely recovered even after losing portions of the original data.