Uploading

The B2 APIs are simple and straightforward, but there are still a few things that you need to look out for when writing code to upload files.

Uploading Single Files

To upload a single file, first you call b2_get_upload_url, to get a URL, then you call b2_upload_file using that URL. If everything goes as planned, that's it and you are done.

The upload URL you get is targeted at a single storage pod in the Backblaze data center. This makes uploads efficient because you are sending the data directly to the place where it will be stored. But, it means that if that storage pod is unable to take your data right now, you'll have to get a new upload URL and try again.

We recommend that you write your code to try five different upload URLs before reporting an error. Two attempts are almost always good enough, and five failures is a sure sign that something is wrong with your request, or that you are having problems connecting to the B2 service.

Some errors returned from b2_upload_file mean that you should get a new upload URL and try again, while others mean that there is a problem with your request and trying again will not help. These indicate that you should get a new upload URL and try again:

  • Unable to make an HTTP connection, including connection timeout.
  • Status of 401 Unauthorized, and an error code of expired_auth_token
  • Status of 408 Request Timeout
  • Any HTTP status in the 5xx range, including 503 Service Unavailable
  • "Broken pipe" sending the contents of the file.
  • A timeout waiting for a response (socket timeout).

The last one, "broken pipe", happens when you are sending a file big enough that the buffers in the HTTP connection won't hold it. HTTP client libraries send the entire request before looking for a response, and if the B2 server has already replied with an error, you'll be unable to send the entire file and will get a "broken pipe" error.

Other errors you may get while uploading are:

  • 400 Bad Request
    • bad_request - message will describe the problem
    • cap_exceeded - you have reached the storage cap that you set
  • 401 Unauthorized
    • missing_auth_token - there is no Authorization header
    • bad_auth_token - the authorization token is not valid
  • 403 Forbidden
    • cap_exceeded - you have reached the storage cap that you set

SHA1 Checksums

You must always include the X-Bz-Content-Sha1 header with your upload request. The value you provide can be: (1) the 40-character hex checksum of the file, (2) the string hex_digits_at_end, or (3) the string do_not_verify.

Whenever possible, we recommend the first option, including the checksum in the header. A request to upload a 5-byte file containing the string "hello" would look like this:

Authorization: <auth_token>
X-Bz-File-Name: hello.txt
Content-Length: 5
Content-Type: text/plain
X-Bz-Content-Sha1: f572d396fae9206628714fb2ce00f72e94f2258f

hello

With the second option, you append the 40-character hex sha1 to the end of the request body, immediately after the contents of the file being uploaded. Note that the content length is the size of the file plus 40.

Authorization: <auth_token>
X-Bz-File-Name: hello.txt
Content-Length: 45
Content-Type: text/plain
X-Bz-Content-Sha1: hex_digits_at_end

hellof572d396fae9206628714fb2ce00f72e94f2258f

We do not recommend the final option: specifying do_not_verify as the checksum and letting B2 compute the checksum of the file. In the case where there has been data corruption and the checksum doesn't match the data sent, the first two options give B2 the opportunity to verify the checksum, and reject the upload without storing anything in B2. With this final option, the file is stored no matter what, and you have to delete it yourself if there is a problem with the checksum. This is what the third option looks like:

Authorization: <auth_token>
X-Bz-File-Name: hello.txt
Content-Length: 5
Content-Type: text/plain
X-Bz-Content-Sha1: do_not_verify

hello

If you choose the do_not_verify option, the checksum returned in the response from uploading, when listing files, and when downloading the file will have "unverified:" prepended to the checksum, like this:

X-Bz-Content-Sha1: unverified:f572d396fae9206628714fb2ce00f72e94f2258f

Uploading in Parallel

The URL and authorization token that you get from b2_get_upload_url can be used by only one thread at a time. If you want multiple threads running, each one needs to get its own URL and auth token. It can keep using that URL and auth token for multiple uploads, until it gets a returned status indicating that it should get a new upload URL.

Uploading Large Files

The process for uploading the parts of a large file is just like uploading individual files, except that you use b2_get_upload_part_url to get the upload URL and authorization token, and use b2_upload_part for each of the parts.

As with regular files, each thread that uploads must make its own call to b2_get_upload_part_url.

Code Structure

This Java-like code is an outline that shows how to upload multiple files. It can be used either in a single-threaded application, or as one of the threads in a parallel uploader. It assumes that it has a Queue of files to upload, and runs forever uploading files from the queue. It gets a new URL and auth token when it has a file to upload and the old one is no good any more.

void uploadFiles(Queue<UploadInfo> queue) {
    // Initially, we don't have an upload URL and authorization token
    UrlAndAuthToken urlAndAuthToken = null;

    // Keep looping and uploading files forever
    while (true) {
        // Get the info on the next file to upload
        UploadInfo uploadInfo = queue.take();

        // Try several times to upload the file.  It's normal
        // for uploads to fail if the target storage pod is
        // too busy.  It's also normal (but infrequent) to get
        // a 429 Too Many Requests if you are uploading a LOT
        // of files.

        boolean succeeded = false;
        for (int i = 0; i < 5 && !succeeded; i++) {

            // Get a new upload URL and auth token, if needed.
            if (urlAndAuthToken == null) {
                B2Request getUrlRequest = makeGetUploadUrlRequest();
                B2Response getUrlResponse = callB2WithBackOff(request);
                int status = response.status;
                if (status != 200 /*OK*/) {
                     reportFailure(uploadInfo, response);
                     return;
                }
                urlAndAuthToken = response.getUrlAndAuthToken();
            }

            // Upload the file.  When calling upload, don't use
            // back-off.  If there's any problem, we want to go
            // around the loop again and get another upload URL.
            B2Request uploadRequest = makeUploadRequest(uploadInfo);
            B2Response response = callHttpService(uploadRequest)
            int status = response.status;
            if (status == 200 /*OK*/) {
                reportSuccess(uploadInfo);
                succeeded = true;
                break;
            }
            else if (response.isFailureToConnect()) {
                // Try connecting somewhere else next time.
                urlAndAuthToken = null;
            }
            else if (response.isBrokenPipe()) {
                // Could not send entire file.  Try connecting somewhere else next time.
                // If upload caps are exceeded, the next call to get an upload URL will
                // respond with a useful error message.
                urlAndAuthToken = null;
            }
            else if (status == 401 /* Unauthorized */  && response.status_code.equals("expired_auth_token")) {
                // Upload auth token has expired.  Time for a new one.
                urlAndAuthToken = null;
            }
            else if (status == 408 /* Request Timeout */) {
                // Retry and hope the upload goes faster this time
                exponentialBackOff();
            }
            else if (status == 429 /* Too Many Requests */) {
                // We are making too many requests
                exponentialBackOff();
            }
            else {
                // Something else went wrong.  Give up.
                reportFailure(uploadInfo, response);
                return;
            }
        }

        if (!succeeded) {
            reportFailure(uploadInfo, response);
            return;
        }
    }
}

B2Response callB2WithBackOff(B2Request request) {
    int delaySeconds = 1;
    int maxDelay = 64;
    while (true) {
        B2Response response = callHttpService(request);
        int status = response.status;
        if (status == 429 /*Too Many Requests*/) {
            sleepSeconds(response.getHeader('Retry-After'));
            delaySeconds = 1.0; // reset 503 back-off
        }
        else if (status == 503 /*Service Unavailable*/) {
            if (maxDelay < delaySeconds) {
                // give up -- delay is too long
                return response
            }
            sleepSeconds(delaySeconds);
            delaySeconds = delaySeconds * 2;
        }
        else {
            return response;
        }
    }
}