- Print
- DarkLight
Resiliency, Durability, and Availability
- Print
- DarkLight
Backblaze B2 Cloud Storage takes advantage of the Backblaze Vaults architecture to create a highly resilient, durable, and available storage service. The Backblaze Vault architecture calculates at 99.999999999% (11 x 9s) annual durability.
Distributing Data
A Backblaze Vault consists of 20 storage pods, with the data evenly spread across all 20 pods. Each storage pod in a given vault has the same number of drives, and the drives are all the same size.
Drives in the same drive position in each of the 20 storage pods are grouped together into a storage unit called a “tome.” Each file is stored in one tome, and is spread out across the tome for reliability and availability.
Every file uploaded to a Backblaze Vault is broken into pieces before being stored. Each of those pieces is called a “shard.” Parity shards are added to add redundancy so that a file can be fetched from a Backblaze Vault even if some of the pieces are not available.
Each file is stored as 20 shards: 17 data shards and three parity shards. Because those shards are distributed across 20 storage pods in 20 cabinets, the Backblaze Vault is resilient to the failure of a storage pod, power loss to an entire cabinet, or even a cabinet-level networking outage.
Files can be written to the Backblaze Vault when one pod is down, and still have two parity shards to protect the data. Even in the extreme and unlikely case where three storage pods in a Backblaze Vault are offline, the files in the vault are still available because they can be reconstructed from the 17 pieces that are available.
Reed-Solomon Erasure Coding Implementation
Just like the redundant array of independent disks (RAID) implementations, the Backblaze Vault software uses Reed-Solomon erasure coding to create the parity shards. But, unlike Linux software RAID, which offers just one or two parity blocks, the Backblaze Vault software allows for an arbitrary mix of data and parity. Backblaze is currently using 17 data shards plus three parity shards.
The beauty of Reed-Solomon is that Backblaze can then re-create the original file from any 17 of the shards. If one of the original data shards is unavailable, it can be re-computed from the other 16 original shards, plus one of the parity shards. Even if three of the original data shards are not available, they can be re-created from the other 17 data and parity shards.