Uploading
The B2 APIs are simple and straightforward, but there are still a few things that you need to look out for when writing code to upload files.
Uploading Single Files
To upload a single file, first you call
b2_get_upload_url
,
to get a URL, then you call
b2_upload_file
using that URL. If everything goes as planned, that's it
and you are done.
The upload URL you get is targeted at a single storage pod in the Backblaze data center. This makes uploads efficient because you are sending the data directly to the place where it will be stored. But, it means that if that storage pod is unable to take your data right now, you'll have to get a new upload URL and try again.
We recommend that you write your code to try five different upload URLs before reporting an error. Two attempts are almost always good enough, and five failures is a sure sign that something is wrong with your request, or that you are having problems connecting to the B2 service.
Some errors returned from
b2_upload_file
mean that you should get a new upload URL and try again,
while others mean that there is a problem with your request
and trying again will not help.
These indicate that you should get a new upload URL and try again:
- Unable to make an HTTP connection, including connection timeout.
- Status of 401 Unauthorized, and an error code of
expired_auth_token
- Status of 408 Request Timeout
- Any HTTP status in the 5xx range, including 503 Service Unavailable
- "Broken pipe" sending the contents of the file.
- A timeout waiting for a response (socket timeout).
The last one, "broken pipe", happens when you are sending a file big enough that the buffers in the HTTP connection won't hold it. HTTP client libraries send the entire request before looking for a response, and if the B2 server has already replied with an error, you'll be unable to send the entire file and will get a "broken pipe" error.
Other errors you may get while uploading are:
-
400 Bad Request
bad_request
- various problems, including the file already being finishedcap_exceeded
- you have reached the storage cap that you set
-
401 Unauthorized
missing_auth_token
- there is no Authorization headerbad_auth_token
- the authorization token is not valid
-
403 Forbidden
cap_exceeded
- you have reached the storage cap that you set
SHA1 Checksums
You must always include the X-Bz-Content-Sha1
header
with your upload request. The value you provide can be: (1) the
40-character hex checksum of the file, (2) the string hex_digits_at_end
, or
(3) the string do_not_verify
.
Whenever possible, we recommend the first option, including the checksum in the header. A request to upload a 5-byte file containing the string "hello" would look like this:
Authorization: <auth_token> X-Bz-File-Name: hello.txt Content-Length: 5 Content-Type: text/plain X-Bz-Content-Sha1: f572d396fae9206628714fb2ce00f72e94f2258f hello
With the second option, you append the 40-character hex sha1 to the end of the request body, immediately after the contents of the file being uploaded. Note that the content length is the size of the file plus 40.
Authorization: <auth_token> X-Bz-File-Name: hello.txt Content-Length: 45 Content-Type: text/plain X-Bz-Content-Sha1: hex_digits_at_end hellof572d396fae9206628714fb2ce00f72e94f2258f
We do not recommend the final option:
specifying do_not_verify
as the checksum and
letting B2 compute the checksum of the file.
In the case where there has been data corruption and the checksum doesn't
match the data sent, the first two options give B2 the opportunity to
verify the checksum, and reject the upload without storing anything in B2.
With this final option, the file is stored no matter what, and you have
to delete it yourself if there is a problem with the checksum. This is what the
third option looks like:
Authorization: <auth_token> X-Bz-File-Name: hello.txt Content-Length: 5 Content-Type: text/plain X-Bz-Content-Sha1: do_not_verify hello
If you choose the do_not_verify
option, the checksum returned
in the response from uploading, when listing files, and when downloading the file will have
"unverified:" prepended to the checksum, like this:
X-Bz-Content-Sha1: unverified:f572d396fae9206628714fb2ce00f72e94f2258f
Uploading in Parallel
The URL and authorization token that you get from
b2_get_upload_url
can be used by only one thread at a time. If you want
multiple threads running, each one needs to get its
own URL and auth token. It can keep using that URL and
auth token for multiple uploads, until it gets a returned
status indicating that it should get a new upload URL.
Uploading Large Files
The process for uploading the parts of a large file is
just like uploading individual files, except that you use
b2_get_upload_part_url
to get the upload URL and authorization token, and use
b2_upload_part
for each of the parts.
As with regular files, each thread that uploads must make its
own call to
b2_get_upload_part_url
.
After you have uploaded the parts, use
b2_finish_large_file
to combine the
uploaded parts into one large file.
It may be that the call to finish a large file succeeds, but you don't know
it because the request timed out, or the connection was broken. In that case,
retrying will result in a 400 Bad Request response because the file is already
finished. If that happens, we recommend
calling b2_get_file_info
to
see if the file is there; if the file is there, you can count the upload
as a success.
Code Structure
This Java-like code is an outline that shows how to upload multiple files. It can be used either in a single-threaded application, or as one of the threads in a parallel uploader. It assumes that it has a Queue of files to upload, and runs forever uploading files from the queue. It gets a new URL and auth token when it has a file to upload and the old one is no good any more.
void uploadFiles(Queue<UploadInfo> queue) { // Initially, we don't have an upload URL and authorization token UrlAndAuthToken urlAndAuthToken = null; // Keep looping and uploading files forever while (true) { // Get the info on the next file to upload UploadInfo uploadInfo = queue.take(); // Try several times to upload the file. It's normal // for uploads to fail if the target storage pod is // too busy. It's also normal (but infrequent) to get // a 429 Too Many Requests if you are uploading a LOT // of files. boolean succeeded = false; for (int i = 0; i < 5 && !succeeded; i++) { // Get a new upload URL and auth token, if needed. if (urlAndAuthToken == null) { B2Request getUrlRequest = makeGetUploadUrlRequest(); B2Response getUrlResponse = callB2WithBackOff(request); int status = response.status; if (status != 200 /*OK*/) { reportFailure(uploadInfo, response); return; } urlAndAuthToken = response.getUrlAndAuthToken(); } // Upload the file. When calling upload, don't use // back-off. If there's any problem, we want to go // around the loop again and get another upload URL. B2Request uploadRequest = makeUploadRequest(uploadInfo); B2Response response = callHttpService(uploadRequest) int status = response.status; if (status == 200 /*OK*/) { reportSuccess(uploadInfo); succeeded = true; break; } else if (response.isFailureToConnect()) { // Try connecting somewhere else next time. urlAndAuthToken = null; } else if (response.isBrokenPipe()) { // Could not send entire file. Try connecting somewhere else next time. // If upload caps are exceeded, the next call to get an upload URL will // respond with a useful error message. urlAndAuthToken = null; } else if (status == 401 /* Unauthorized */ && (response.status_code.equals("expired_auth_token") || response.status_code.equals("bad_auth_token")) { // Upload auth token has expired. Time for a new one. urlAndAuthToken = null; } else if (status == 408 /* Request Timeout */) { // Retry and hope the upload goes faster this time exponentialBackOff(); } else if (status == 429 /* Too Many Requests */) { // We are making too many requests exponentialBackOff(); } else { // Something else went wrong. Give up. reportFailure(uploadInfo, response); return; } } if (!succeeded) { reportFailure(uploadInfo, response); return; } } } B2Response callB2WithBackOff(B2Request request) { int delaySeconds = 1; int maxDelay = 64; while (true) { B2Response response = callHttpService(request); int status = response.status; if (status == 429 /*Too Many Requests*/) { sleepSeconds(response.getHeader('Retry-After')); delaySeconds = 1.0; // reset 503 back-off } else if (status == 503 /*Service Unavailable*/) { if (maxDelay < delaySeconds) { // give up -- delay is too long return response } sleepSeconds(delaySeconds); delaySeconds = delaySeconds * 2; } else { return response; } } }