The Path to the S3 Compatible API: The Authentication Challenge

We launched our Backblaze S3 Compatible API in May of 2020 and released them for GA in July. After a launch, it’s easy to forget about the hard work that made it a reality. With that in mind, we’ve asked Malay Shah, our senior software engineering manager, to explain one of the challenges he found intriguing in the process. If you’re interested in developing your own APIs, or just curious about how ours have come to be, we think you’ll find Malay’s perspective interesting.

When we started building our Backblaze S3 Compatible API, we already had Backblaze B2 Cloud Storage, so the hard work to create a durable, scalable, and highly performant object store was already done. And B2 was already conceptually similar to S3, so the task seemed far from impossible. That’s not to say that it was easy or without any challenges. There were enough differences between the B2 Native APs and the S3 API to make the project interesting, and one of those is authentication. In this post, I’m going to walk you through how we approached the challenge of authentication in our development of the Backblaze S3 Compatible API.

The Challenge of Authentication: S3 vs. B2 Cloud Storage

B2 Cloud Storage’s approach to authentication is login/session based, where the API key ID and secret are used to log in and obtain a session ID, which is then provided on each subsequent request. S3 requires each individual request to be signed using the key ID and secret.

Our login/session approach does not require storing the API key secret on our end, only a hash of it. As a result, any compromise of our database would not allow hackers to impersonate customers and access their data. However, this approach is susceptible to “man-in-the-middle” attacks. Capturing the login request (API call to b2_authorize_account) would reveal the API key ID and secret to the attacker; capturing subsequent requests would reveal the session ID which is valid for 24 hours. Either of these would allow a hacker to impersonate a customer, which is clearly not a good thing. That said, our system and basic data safety practices will protect users. First, it is important to maintain your trusted certificate list. Our APIs are only available over HTTPS, and HTTPS in conjunction with a well managed trusted certificate list mitigates the likelihood of a “man-in-the-middle” attack.

Amazon’s approach with S3 requires their backend to store the secret because authenticating a request requires the backend to replicate the request signing process for each call. As a result, request signing is much less susceptible to a “man-in-the-middle” attack. The most any bad actor could do is replay the request; a hacker would not be able to impersonate the customer and make other requests. However, compromising the systems that store the API key secret would allow impersonation of the customer. This risk is typically mitigated by encrypting the API key secret and storing that key somewhere else, thus requiring multiple systems to be compromised.

Both approaches are common patterns for authentication, each with their own strengths and risks.

Storing the API Key Secret

To implement AWS’s request signing in our system, we first needed to figure out how to store the API key secret. A compromise of our database by a hacker who has obtained the hash of the secret for B2 does not allow that hacker to impersonate customers, but if we stored the secret itself, it absolutely would. So we couldn’t store the secret alongside the other application key data. We needed another solution, and it needed to handle the number of application keys we have (millions) and the volume of API requests we service (hundreds of thousands per minute), without slowing down requests or adding additional risks of failure.

Our solution is to encrypt the secret and store that alongside the other application key data in our database. The encryption key is then kept in a secrets management solution. The database already supports the volume of requests we service and decrypting the secret is computationally trivial, so there is no noticeable performance overhead.

With this approach, a compromise of the database alone would only reveal the encrypted version of the secret, which is just as useless as having the hash. Multiple systems must be compromised to obtain the API key secret.

Implementing the Request Signing Algorithm

We chose to only implement AWS’s Signature Version 4 as Version 2 is deprecated and is not allowed for use on newly created buckets. Within Version 4, there are multiple ways to sign the request: sign only the headers, sign the whole request, sign individual chunks, and pre-signed URLs. All of these follow a similar pattern but differ enough to warrant individual consideration for testing. We absolutely needed to get this right so we tested authentication in many ways:

Ran through Amazon’s test suite of example requests and expected signatures
Tested 20 applications that work with the Backblaze S3 Compatible API including Veeam and Synology
Ran Ceph’s S3-tests suite
Manually tested using the AWS command line interface
Manually tested using Postman
Built automated tests using both the Python and Java SDKs
Made HTTP requests directly to test cases not possible through the Python or Java SDKs
Hired ~~hackers~~ security researchers to break our implementation

With the B2 Native API authentication model, we can verify authentication by examining the “Authorization” header and only then move on to processing the request, but S3 requests—where the whole request is signed or uses signed chunks—can only be verified after reading the entire request body. For most of the S3 APIs, this is not an issue. The request bodies can be read into memory, verified, and then continue on to processing. However, for file uploads, the request body can be as large as 5GB—far too much to store in memory—so we reworked our uploading logic to handle authentication failures occurring at the end of the upload and to only record API usage after authentication passes.

The different ways to sign requests meant that in some cases we have to verify the request after the headers arrive, and in other cases verify only after the entire request body is read. We wrote the signature verification algorithm to handle each of these request types. Amazon had published a test suite (which is now no longer available, unfortunately) for request signing. This test suite was designed to help people call into the Amazon APIs, but due to the symmetric nature of the request signing process, we were able to use it as well to test our server-side implementation. This was not an authoritative or comprehensive test suite, but it was a very helpful starting point. As was the AWS command line interface, which in debug mode will output the intermediate calculations to generate the signature, namely the canonical request and string to sign.

However, when we built our APIs on top of the signature validation logic, we discovered that our APIs handled reading the request body in different ways, leading to some APIs succeeding without verifying the request, yikes! So there were even more combinations that we needed to test, and not all of these combinations could be tested using the AWS software development kits (SDKs).

For file uploads, the SDKs only signed the headers and not the request body—a reasonable choice for file uploads. But as implementers, we must support all legal requests so we made direct HTTP requests to verify whole request signing and signed chunk requests. There’s also instrumentation now to ensure that all successful requests are verified.

Looking Back

We expected this to be a big job, and it was. Testing all the corner cases of request authentication was the biggest challenge. There was no single approach that covered everything; all of the above items tested different aspects of authentication. Having a comprehensive and multifaceted testing plan allowed us to find and fix issues we would have never thought of, and ultimately gave us confidence in our implementation.