Hyperscale Doesn’t Mean Hyper-Resilient

A decorative image showing clouds connected by digital lines.

Last week, a misconfiguration in Google Cloud’s API infrastructure led to a major global outage. Not long before that, IBM Cloud suffered its second significant disruption in a matter of weeks. The incidents impacted everything from enterprise infrastructure to consumer-facing apps—Gmail, Spotify, Cloudflare, and countless internal systems built on top of these platforms.

Understandably, much of the coverage has focused on what went wrong. But the more important question might be: Why does something like this ripple so far and wide in a system supposedly built for resilience?

Single points of failure in a multi-service world

One might assume that as cloud providers scale, their reliability scales with them. However, these outages reveal a critical distinction: the difference between data-layer resilience and control-plane fragility.

The problem is, that robust data layer can be rendered useless if the “front door” is locked. Hyperscale cloud platforms have grown so interdependent and complex that a fault in one layer can bring vast swaths of unrelated services to a halt. This is the risk of vertical integration: When one vendor provides compute, storage, networking, and identity, a simple bug or misconfiguration can cascade through thousands of applications, not because the applications are fragile, but because they’ve all tied themselves to the same operational backbone.

Redundancy, or the illusion of it?

In theory, cloud architecture encourages redundancy. But in practice, many companies—even those using multi-cloud strategies—tend to consolidate key services like authentication and orchestration with a single vendor. When that vendor’s services go down, it doesn’t matter that your data is replicated across three availability zones in the same data center. If you can’t log in to access it, your redundancy becomes purely theoretical.

After last week’s outages, some companies may re-evaluate their cloud strategy—but it’s not as easy as flipping a switch. True diversification is complex, requiring time, engineering resources, and a cultural shift toward designing for failure.

The reality: Fewer assumptions, more contingencies

The knee-jerk reaction to events like these is often to demand better SLAs, more transparency, or faster recovery times. Those are valid asks. But they might miss the deeper lesson: Assumptions about uptime and “X-nines” reliability are only helpful until the moment they aren’t. What users need are not just better guarantees, but clearer paths to self-determination when things break.

That might look like:

Designing for graceful degradation. What can your service do when its cloud provider is partially offline?
Reconsidering dependencies. Are you tying core logic to a provider’s proprietary APIs, or abstracting where possible?
Asking harder questions during vendor selection. Not just, “Can it scale?” but “What happens when it fails?”

Case study: Sardius Media bakes in redundancy

Sardius Media, a global video platform, built cloud redundancy into its DNA. Every piece of media is replicated across multiple S3 compatible storage providers—including Backblaze B2—using a proprietary “race” mechanism that always delivers the fastest, most reliable storage experience for end users. This architecture keeps files available, resilient, and protected, even if one provider has an outage.

No single point of failure: Content lives across multiple clouds
Best performance: Requests race to the fastest provider in real time
Durability, affordability, and global reach—Backblaze B2 wins the race up to 80% of the time globally

Sardius Media’s strategy proves that open, multi-provider storage isn’t just theory—it’s operational resilience in action.

What does that mean for you?

The answer isn’t to abandon the cloud, but to get smarter about how you use it. This means architecting systems that don’t just have data redundancy, but true operational independence.

Maybe that means replicating data to providers who specialize rather than consolidate. Or maybe it just means revisiting architectures that have become too reliant on invisible scaffolding.

What’s clear is this: reliability isn’t a feature you buy from the cloud. It’s a design philosophy that must be shared.

Single points of failure in a multi-service world

Redundancy, or the illusion of it?

The reality: Fewer assumptions, more contingencies

Case study: Sardius Media bakes in redundancy

What does that mean for you?

About David Johnson

Related Posts

How to Back Up Your Synology NAS to the Cloud | Backblaze

New Analyst Report: Backblaze B2 Delivers Up to 3.2x Lower TCO than Hyperscalers

NAS vs. Cloud Storage: Which Remote Storage Option Is Best?