How We Achieved Upload Speeds Faster Than AWS S3

An image of a city skyline with lines going up to a cloud.

You don’t always need the absolute fastest cloud storage—your performance requirements depend on your use case, business objectives, and security needs. But still, faster is usually better. And Backblaze just announced innovation on B2 Cloud Storage that delivers a lot more speed: most file uploads will now be up to 30% faster than AWS S3. 

Today, I’m diving into all of the details of this performance improvement, how we did it, and what it means for you.

The TL:DR

The Results: Customers who rely on small file uploads (1MB or less) can expect to see 10–30% faster uploads on average based on our tests, all without any change to durability, availability, or pricing. 

What Does This Mean for You? 

All B2 Cloud Storage customers will benefit from these performance enhancements, especially those who use Backblaze B2 as a storage destination for data protection software. Small uploads of 1MB or less make up about 70% of all uploads to B2 Cloud Storage and are common for backup and archive workflows. Specific benefits of the performance upgrades include:

  • Secures data in offsite backup faster.
  • Frees up time for IT administrators to work on other projects.
  • Decreases congestion on network bandwidth.
  • Deduplicates data more efficiently.

Veeam® is dedicated to working alongside our partners to innovate and create a united front against cyber threats and attacks. The new performance improvements released by Backblaze for B2 Cloud Storage furthers our mission to provide radical resilience to our joint customers.

—Andreas Neufert, Vice President, Product Management, Alliances, Veeam

When Can I Expect Faster Uploads?

Today. The performance upgrades have been fully rolled out across Backblaze’s global data regions.

How We Did It

Prior to this work, when a customer uploaded a file to Backblaze B2, the data was written to multiple hard disk drives (HDDs). Those operations had to be completed before returning a response to the client. Now, we write the incoming data to the same HDDs and also, simultaneously, to a pool of solid state drives (SSDs) we call a “shard stash,” waiting only for the HDD writes to make it to the filesystems’ in-memory caches and the SSD writes to complete before returning a response. Once the writes to HDD are complete, we free up the space from the SSDs so it can be reused.

Since writing data to an SSD is much faster than writing to HDDs, the net result is faster uploads. 

That’s just a brief summary; if you’re interested in the technical details (as well as the results of some rigorous testing), read on!

The Path to Performance Upgrades

As you might recall from many Drive Stats blog posts and webinars, Backblaze stores all customer data on HDDs, affectionately termed ‘spinning rust’ by some. We’ve historically reserved SSDs for Storage Pod (storage server) boot drives. 

Until now. 

That’s right—SSDs have entered the data storage chat. To achieve these performance improvements, we combined the performance of SSDs with the cost efficiency of HDDs. First, I’ll dig into a bit of history to add some context to how we went about the upgrades.

HDD vs. SSD

IBM shipped the first hard drive way back in 1957, so it’s fair to say that the HDD is a mature technology. Drive capacity and data rates have steadily increased over the decades while cost per byte has fallen dramatically. That first hard drive, the IBM RAMAC 350, had a total capacity of 3.75MB, and cost $34,500. Adjusting for inflation, that’s about $375,000, equating to $100,000 per MB, or $100 billion per TB, in 2023 dollars.

A photograph of people pushing one of the first hard disk drives into a truck.
An early hard drive shipped by IBM. Source.

Today, the 16TB version of the Seagate Exos X16—an HDD widely deployed in the Backblaze B2 Storage Cloud—retails for around $260, $16.25 per TB. If it had the same cost per byte as the IBM RAMAC 250, it would sell for $1.6 trillion—around the current GDP of Australia!

SSDs, by contrast, have only been around since 1991, when SanDisk’s 20MB drive shipped in IBM ThinkPad laptops for an OEM price of about $1,000. Let’s consider a modern SSD: the 3.2TB Micron 7450 MAX. Retailing at around $360, the Micron SSD is priced at $112.50 per TB, nearly seven times as much as the Seagate HDD.

So, HDDs easily beat SSDs in terms of storage cost, but what about performance? Here are the numbers from the manufacturers’ data sheets:

Seagate Exos X16Micron 7450 MAX
Model numberST16000NM001GMTFDKCB3T2TFS
Capacity16TB3.2TB
Drive cost$260$360
Cost per TB$16.25$112.50
Max sustained read rate (MB/s)2616,800
Max sustained write rate (MB/s)2615,300
Random read rate, 4kB blocks, IOPS170/440*1,000,000
Random write rate, 4kB blocks, IOPS170/440*390,000

Since HDD platters rotate at a constant rate, 7,200 RPM in this case, they can transfer more blocks per revolution at the outer edge of the disk than close to the middle—hence the two figures for the X16’s transfer rate.

The SSD is over 20 times as fast at sustained data transfer than the HDD, but look at the difference in random transfer rates! Even when the HDD is at its fastest, transferring blocks from the outer edge of the disk, the SSD is over 2,200 times faster reading data and nearly 900 times faster for writes.

This massive difference is due to the fact that, when reading data from random locations on the disk, the platters have to complete an average of 0.5 revolutions between blocks. At 7,200 rotations per minute (RPM), that means that the HDD spends about 4.2ms just spinning to the next block before it can even transfer data. In contrast, the SSD’s data sheet quotes its latency as just 80µs (that’s 0.08ms) for reads and 15µs (0.015ms) for writes, between 84 and 280 times faster than the spinning disk.

Let’s consider a real-world operation, say, writing 64kB of data. Assuming the HDD can write that data to sequential disk sectors, it will spin for an average of 4.2ms, then spend 0.25ms writing the data to the disk, for a total of 4.5ms. The SSD, in contrast, can write the data to any location instantaneously, taking just 27µs (0.027ms) to do so. This (somewhat theoretical) 167x speed advantage is the basis for the performance improvement.

Why did I choose a 64kB block? As we mentioned in a recent blog post focusing on cloud storage performance, in general, bigger files are better when it comes to the aggregate time required to upload a dataset. However, there may be other requirements that push for smaller files. Many backup applications split data into fixed size blocks for upload as files to cloud object storage. There is a trade-off in choosing the block size: larger blocks improve backup speed, but smaller blocks reduce the amount of storage required. In practice, backup blocks may be as small as 1MB or even 256kB. The 64kB blocks we used in the calculation above represent the shards that comprise a 1MB file.

The challenge facing our engineers was to take advantage of the speed of solid state storage to accelerate small file uploads without breaking the bank.

Improving Write Performance for Small Files

When a client application uploads a file to the Backblaze B2 Storage Cloud, a coordinator pod splits the file into 16 data shards, creates four additional parity shards, and writes the resulting 20 shards to 20 different HDDs, each in a different Pod.

Note: As HDD capacity increases, so does the time required to recover after a drive failure, so we periodically adjust the ratio between data shards and parity shards to maintain our eleven nines durability target. In the past, you’ve heard us talk about 17 + 3 as the ratio but we also run 16 + 4 and our very newest vaults use a 15 + 5 scheme.

Each Pod writes the incoming shard to its local filesystem; in practice, this means that the data is written to an in-memory cache and will be written to the physical disk at some point in the near future. Any requests for the file can be satisfied from the cache, but the data hasn’t actually been persistently stored yet.

We need to be absolutely certain that the shards have been written to disk before we return a “success” response to the client, so each Pod executes an fsync system call to transfer (“flush”) the shard data from system memory through the HDD’s write cache to the disk itself before returning its status to the coordinator. When the coordinator has received at least 19 successful responses, it returns a success response to the client. This ensures that, even if the entire data center was to lose power immediately after the upload, the data would be preserved.

As we explained above, for small blocks of data, the vast majority of the time spent writing the data to disk is spent waiting for the drive platter to spin to the correct location. Writing shards to SSD could result in a significant performance gain for small files, but what about that 7x cost difference?

Our engineers came up with a way to have our cake and eat it too by harnessing the speed of SSDs without a massive increase in cost. Now, upon receiving a file of 1MB or less, the coordinator splits it into shards as before, then simultaneously sends the shards to a set of 20 Pods and a separate pool of servers, each populated with 10 of the Micron SSDs described above—a “shard stash.” The shard stash servers easily win the “flush the data to disk” race and return their status to the coordinator in just a few milliseconds. Meanwhile, each HDD Pod writes its shard to the filesystem, queues up a task to flush the shard data to the disk, and returns an acknowledgement to the coordinator.

Once the coordinator has received replies establishing that at least 19 of the 20 Pods have written their shards to the filesystem, and at least 19 of the 20 shards have been flushed to the SSDs, it returns its response to the client. Again, if power was to fail at this point, the data has already been safely written to solid state storage.

We don’t want to leave the data on the SSDs any longer than we have to, so, each Pod, once it’s finished flushing its shard to disk, signals to the shard stash that it can purge its copy of the shard.

Real-World Performance Gains

As I mentioned above, that calculated 167x performance advantage of SSDs over HDDs is somewhat theoretical. In the real world, the time required to upload a file also depends on a number of other factors—proximity to the data center, network speed, and all of the software and hardware between the client application and the storage device, to name a few.

The first Backblaze region to receive the performance upgrade was U.S. East, located in Reston, Virginia. Over a 12-day period following the shard stash deployment there, the average time to upload a 256kB file was 118ms, while a 1MB file clocked in at 137ms. To replicate a typical customer environment, we ran the test application at our partner Vultr’s New Jersey data center, uploading data to Backblaze B2 across the public internet.

For comparison, we ran the same test against Amazon S3’s U.S. East (Northern Virginia) region, a.k.a. us-east-1, from the same machine in New Jersey. On average, uploading a 256kB file to S3 took 157ms, with a 1MB file taking 153ms.

So, comparing the Backblaze B2 U.S. East region to the Amazon S3 equivalent, we benchmarked the new, improved Backblaze B2 as 30% faster than S3 for 256kB files and 10% faster than S3 for 1MB files.

These low-level tests were confirmed when we timed Veeam Backup & Replication software backing up 1TB of virtual machines with 256k block sizes. Backing the server up to Amazon S3 took three hours and 12 minutes; we measured the same backup to Backblaze B2 at just two hours and 15 minutes, 40% faster than S3.

Test Methodology

We wrote a simple Python test app using the AWS SDK for Python (Boto3). Each test run involved timing 100 file uploads using the S3 PutObject API, with a 10ms delay between each upload. (FYI, the delay is not included in the measured time.) The test app used a single HTTPS connection across the test run, following best practice for API usage. We’ve been running the test on a VM in Vultr’s New Jersey region every six hours for the past few weeks against both our U.S. East region and its AWS neighbor. Latency to the Backblaze B2 API endpoint averaged 5.7ms, to the Amazon S3 API endpoint 7.8ms, as measured across 100 ping requests.

What’s Next?

At the time of writing, shard stash servers have been deployed to all of our data centers, across all of our regions. In fact, you might even have noticed small files uploading faster already. It’s important to note that this particular optimization is just one of a series of performance improvements that we’ve implemented, with more to come. It’s safe to say that all of our Backblaze B2 customers will enjoy faster uploads and downloads, no matter their storage workload.

print

About Pat Patterson

Pat Patterson is the chief technical evangelist at Backblaze. Over his three decades in the industry, Pat has built software and communities at Sun Microsystems, Salesforce, StreamSets, and Citrix. In his role at Backblaze, he creates and delivers content tailored to the needs of the hands-on technical professional, acts as the “voice of the developer” on the Product team, and actively participates in the wider technical community. Outside the office, Pat runs far, having completed ultramarathons up to the 50 mile distance. Catch up with Pat via Twitter or LinkedIn.