How to Leverage Your Amazon S3 Experience to Code the Backblaze B2 API

Going from S3 to learning Backblaze B2

We wrote recently about how the Backblaze B2 and Amazon S3 APIs are different. What we neglected to mention was how to bridge those differences so a developer can create a B2 interface if they’ve already coded one for S3. John Matze, Founder of BridgeSTOR, put together his list of things to consider when leveraging your S3 API experience to create a B2 interface. Thanks John.   — Andy
Backblaze B2 to Amazon S3 Conversion
by John Matze, Founder of BridgeSTOR

Backblaze B2 Cloud Storage Platform has developed into a real alternative to the Amazon S3 online storage platform with the same redundancy capabilities but at a fraction of the cost.

Sounds great — sign up today!

Wait. If you’re an application developer, it doesn’t come free. The Backblaze REST API is not compatible with Amazon S3 REST API. That is the bad news. The good news — it includes almost the entire set of functionality so converting from S3 to B2 can be done with minimal work once you understand the differences between the two platforms. We created a S3 to B2 shim in a week followed by a few extra weeks of testing and bug fixes.

This article will help you shortcut the process by describing the differences between B2 and S3.

  1. Endpoints: AWS has a standard endpoint of s3.amazonaws.com which redirects to the region where the bucket is located or you may send requests directly to the bucket by a region endpoint. B2 does not have regions, but does have an initial endpoint called api.backblazeb2.com. Every application must start by talking to this endpoint. B2 also requires three other endpoints. One for uploading an object, one for API calls and a third for downloading an object. The upload endpoint is generated on demand when uploading an object while the other two are returned during the authentication process and may be saved for API and download requests.
  1. Host: Unlike Amazon S3, the HTTP header requires the host token. If it is not present, B2 will respond with an error.
  1. JSON: Unlike S3, which uses XML, all B2 calls use JSON. Some API calls require data to be sent on the request. This data must be in JSON and all APIs return JSON as a result. Fortunately, the amount of JSON required is minimal or none at all. We just built a JSON request when required and made a simple JSON parser for returned data.
  1. Authentication: Amazon currently has two major authentication mechanisms with complicated hashing formulas. B2 simply uses the industry standard “HTTP basic auth” algorithm. It takes only a few minutes to get to speed on this algorithm.
  1. Keys: Amazon has the concept of an access key and a secret key. B2 has the equivalent with the access key being your key id (an application key id or account id) and the secret key being the application key (returned from the website or returned by b2_create_key API) that maps to the secret key.
  1. Bucket ID: Unlike S3, almost every B2 API requires a bucket ID. There is a special list bucket call that will display bucket IDs by bucket name. Once you find your bucket name, capture the bucket ID and save it for future API calls.
  1. Directory Listings: B2 Directories again have the same functionality as S3, but with a different API format. Again the mapping is easy: marker is startFileName, prefix is prefix, max-keys is maxFileCount and delimiter is delimiter. In both cases, the marker tells the service where to start the next search. The Amazon S3 nextmarker is literally the next marker to be searched. The B2 nextFileName is also the next place to search; it’s the last file name that was returned with a trailing space character. In both systems, when paging through the files in a bucket, you take the “marker” or “nextFileName” from one request and pass it to the next. You’ll get a list of all files with no repeats.
  1. Uploading an object: Uploading an object in B2 is quite different than S3. S3 just requires you to send the object to an endpoint and Amazon will automatically place the object somewhere in their environment. In the B2 world, you must request a location for the object with an API call and then send the object to the returned location. The first API will send you a temporary upload token and you can continue to use this token for one day without generating another, with the caveat that you have to monitor for failures from B2. The B2 environment may become full or some other issue will require you to request another upload token.
  1. Downloading an Object: Downloading an object in B2 is really easy. There is a download endpoint that is returned during the authentication process and you pass your request to that endpoint. The object is downloaded just like Amazon S3.
  1. Multipart Upload: Finally, multipart upload. The beast in S3 is just as much of a beast in B2. Again the good news is there is a one to one mapping.
    1. Multipart Init: The equivalent initialization returns a fileid. This ID will be used for future calls.
    2. Mulitpart Upload: Similar to uploading an object, you will need to get the API location to place the part. So use the fileid from “a” above and call B2 for the endpoint to place the part. Backblaze recommends that the upload payload to be hashed with a SHA1 algorithm. Once done, simply pass the SHA and the part number to the URL and the part is uploaded. This SHA1 component is equivalent to an etag in the S3 world so save it for later.
    3. Multipart Complete: Like S3, you will have to build a return structure for each part. B2 of course requires this structure to be in JSON but like S3, B2 requires the part number and the SHA1 (etag) for each part.

What Doesn’t Port

We found almost everything we required easily mapped from S3 to B2 except for a few issues. To be fair, Backblaze is working on the following in future versions.

  1. Copy Object doesn’t exist: This could cause some issues with applications for copying or renaming objects. BridgeSTOR has a workaround for this situation so it wasn’t a big deal for our application.
  2. Directory Objects don’t exist: Unlike Amazon, where an object with that ends with a “/” is considered a directory, this does not port to B2. There is an undocumented object name that B2 applications use called .bzEmpty. Numerous 3rd party applications, including BridgeSTOR, treat an object ending with .bzEmpty as a directory name. This is also important for directory listings described above. If you choose to use this method, you will be required to replace the “.bzEmpty” with a “/.”

In conclusion, you can see the B2 API is different than the Amazon S3, but as far as functionality they are basically the same. For us at first it looked like it was going to be a large task, but once we took the time to understand the differences, porting to B2 was not a major job for our application. I hope this document helps in your S3 to B2 conversion.

— John Matze, BridgeSTOR

 

9-4-2018 (AK) — Cleaned up the language in sections 1, 7, and 8 to reflect the most current operation of B2.

print

About Andy Klein

Andy Klein is the Principal Cloud Storage Storyteller at Backblaze. He has over 25 years of experience in technology marketing and during that time, he has shared his expertise in cloud storage and computer security at events, symposiums, and panels at RSA, SNIA SDC, MIT, the Federal Trade Commission, and hundreds more. He currently writes and rants about drive stats, Storage Pods, cloud storage, and more.