How We Improved Download Performance With A Simple Change

A decorative image showing various towers and a cloud.

When you’re moving exabytes of data, every network request, every CPU cycle, every byte matters. Recently, I had the chance to revisit a part of our system that’s been quietly humming along for years. With one small rethink, we helped give our download performance a serious boost.

The idea was almost laughably simple: combine two separate requests into one. But when you’re operating at massive scale, even a “simple” change can make a huge difference.

Curious how we think about performance at scale?

From our new series on engineering innovations, check out Analyzing Performance at Exabyte Scale and What Powers the Performance of Backblaze for a deeper dive into the engineering principles that drive our storage platform.

The challenge: Why we had 40 requests per download

Before the change, downloading a file meant:

A “download coordinator” pod would reach out across the 20 pods that make up a Vault to grab metadata.
Once it had those, it would figure out where the needed bytes lived.
Then it would go back and request the actual data.

That meant 40 separate requests just to get the ball rolling on every download.

The fix: Smarter reads with half the overhead

At some point, it clicked for me: why were we doing this in two steps? The original setup only pulled the bare minimum of data. But what if we just grabbed everything we needed at once? There wasn’t a good reason not to. So I refactored the process so that a pod could grab both the shard header and the data in a single request.

Now:

The coordinator still orchestrates the work.
The receiving pod reads the header, figures out what it needs, and pulls the data—all internally. By shifting this responsibility to the receiving pod, we eliminate a network round trip per pod—20 round trips in total.
The combined result is sent back to the coordinator in a single step.

After the fix, we’re still reading the same amount of data from disk, so disk I/O remains unchanged, but network performance improved significantly. Instead of kicking off 40 network operations, we’re down to about half that. Less traffic, less overhead, faster performance.

It was a simple fix, but the project required a significant amount of software engineering work as well. By shifting responsibilities to the “receiving pod” the coordinator needed to learn to perform lots of just-in-time reasoning about the nature of the download, which required rethinking how we architected portions of the download code.

Why it didn’t just instantly double download performance

If you’re thinking, “shouldn’t that make downloads twice as fast?”—not quite.

Here’s why: Big files get broken into “stripes” during download, and my change only optimizes the first stripe request. Smaller files (a big chunk of our traffic) see the full benefit because they often fit into a single stripe. For larger files, though, the improvement only affects a small part of the overall download, so the impact is more limited.

How we measured the impact

Measuring the real-world effect turned out to be trickier than I expected. Our download traffic isn’t steady; it’s spiky. Under normal conditions, our system wasn’t hitting capacity limits which made it hard to clearly see changes in download performance.

But in our dedicated performance testing environment, where we could send a controlled load of downloads, the improvement was crystal clear. With this change, our system could handle a much higher peak load—great news for handling things like backup surges, AI training runs, and large enterprise downloads.

Beyond download performance: System-wide benefits

One of the coolest side effects? This doesn’t just help customer downloads. It also speeds up internal operations like vault recomputing data drives and server-side copies.

By freeing up CPU cycles that used to be wasted on multiple requests, we open the door for better performance everywhere. And hey, maybe even some minor energy savings—less CPU load means less heat, less power.

What this taught me about optimization

When you’re trying to optimize a massive system, it’s tempting to chase performance with complicated solutions: more threads, smarter caches, fancier hardware.

But sometimes, the real win is just about thinking differently. Questioning assumptions. Asking, “Wait, why are we doing it this way?”

For me, this project was a great reminder that even at exabyte scale, the simplest solution can be the most impactful.

One Simple Change That Made Our Exabyte-Scale Storage Faster

Curious how we think about performance at scale?

The challenge: Why we had 40 requests per download

The fix: Smarter reads with half the overhead

Why it didn’t just instantly double download performance

How we measured the impact

Beyond download performance: System-wide benefits

What this taught me about optimization

About Jerry Sha

Curious how we think about performance at scale?

The challenge: Why we had 40 requests per download

The fix: Smarter reads with half the overhead

Why it didn’t just instantly double download performance

How we measured the impact

Beyond download performance: System-wide benefits

What this taught me about optimization

About Jerry Sha

Related Posts

AI & Ransomware: Inside the Exfiltration Playbook

Let the Games Begin: How Cybersecurity Competitions Build Skills, Careers, and Community

Backblaze Plugs In to Internet2