What To Do When You Get a B2 503 (or 500) Server Error

By | August 16th, 2018

Backblaze logo
Just try again — it’s free, easy, and will work.

Seriously, that’s it. Occasionally, I’ll see questions that amount to, “I’m getting a 503 error; does that mean B2 is down?” To address that question, I wanted to take today’s post to go into a bit more detail on how to handle a 500 or 503 error. The short answer is no. B2 is not down. It simply means that B2 is functioning as designed as the most affordable, easy to use cloud storage service on the planet.

As we’ve described in our developer docs, the best decision is to write your integration in a way that it retries in the event of a 500 or 503. This modest amount of upfront work will result in a stable and transparent long term experience.

The Backblaze Contract Architecture

To understand the vast majority of B2 500 and 503 errors, it’s helpful to go into the “contract architecture” for B2. To create a service that is fully scalable at incredibly low cost, Backblaze has had to innovate in a number of areas. One way is what we refer to as “contract architecture.” It’s the approach that let us cut a large expense in traditional cloud storage infrastructure — high bandwidth load balancers for uploads.

Here’s how it works: when a client wants to push data to Backblaze, it contacts a “dispatching server.” That dispatching server figures out where there data will ultimately live inside a given Backblaze data center.

The dispatching server tells the client “there is space over on vault-9015.

Armed with that information (and an auth token), the client ends its connection with the dispatching server and creates a brand new request directly to vault-9015. The “contract” concept is not novel: ultimately, all APIs are contracts between two entities (machines). In the B2 case, our design leverages that insight as the client and vault negotiate how they will work together. In this example, once authenticated, the client continues to transmit to vault-9015 until it’s done or the vault fills up (or happens to go offline). In those instances, all the client has to do is return to the dispatching server to get information for the next available vault. This is a relatively trivial step and can be easily handled at the software level.

What Causes a B2 500 or 503 Error Response?

The client knows when to go back to the dispatching server because it receives (wait for it) a 500 or 503 error from vault-9015. The system is designed to send a firm message that says, in effect, “stop uploading to vault-9015.” We documented the specifics of what happens where in the B2 error handling protocols. The bottom line is an error in the 500 block should be interpreted by the client as the signal to GO BACK to the dispatching server and ask for a new vault for uploads. Rinse and repeat. It’s a free process that causes negligible incremental overhead and no charges to the customer. Unlike other services, B2 uploads and upload transactions are free.

What if, after getting a 503 and asking the dispatch server for a new URL, you try to upload and get ANOTHER 503 from the new vault? To address this unusual case, write your software to pause for a few seconds, then go back to the dispatch server. In this scenario, the user has hit a statistically unusual situation where the user was told to go to a vault with very little space left and somebody else got there and filled up that space. The second 503 is a sign the system is functioning as designed. Your program can elegantly handle it by going back to the dispatch server.

Other services, notably, Amazon S3, provide the client with a “well known URL.” The client can merrily push data to the URL and Amazon handles load balancing and finding open storage space after receiving the data. That’s a totally valid approach, but objectively more expensive as it involves high bandwidth load balancers. There are other interesting implications to the load balancing scenario. If you’re interested, I wrote a blog post on the difference between the two approaches.

As I discussed in that post, the contract architecture does introduce some complexity when the client has to go back to the dispatching server. But, for that modest amount of error handling upfront, we help fuel Backblaze B2 as an infinitely scalable, fully sustainable service that has and will continue to be the affordability leader in the object storage market.

Brian Wilson
I completed my undergraduate at Oregon State University in 1990, then completed a Stanford Masters degree in 1991. Ever since then I've worked at various companies as a software engineer, in the last few years starting my own software startups called MailFrontier (started in 2002) and most recently Backblaze (started in 2007).

I have a personal web site at http://www.ski-epic.com that I started in 1999 (it was originally just for one vacation, but it kept growing) where I put up my vacation pictures and videos. Nothing professional, it's all just for fun.

In my spare time I enjoy skiing, motorcycling, and boating. I have been lucky enough to travel to a few countries, and I enjoy scouting out new places for the first time.

Follow Brian on:
Twitter: @brianwski
YouTube: brianwski
LinkedIn: brianwski
Google+: brianwski
Reddit: brianwski
Category: Backblaze Bits   Tags: