Large Files

Large files can range from 100MB all the way up to 10TB, well beyond the 5GB limit on normal files.

The 5GB limit on normal files is there primarily because that's how much you can reliably upload with a single HTTP request in a reasonable amount of time. Large files let you upload files in multiple parts.

Parts of a large file can be uploaded in parallel, which can significantly reduce the time it takes to upload terabytes of data. Each part can be anywhere from 5MB to 5GB in size, and you can pick the size that is most convenient for your application. For best upload performance, we recommend using the recommendedPartSize returned by b2_authorize_account.

Uploading Large Files

The steps to upload a large file are:

  1. b2_start_large_file,
  2. b2_get_upload_part_url for each thread uploading,
  3. b2_upload_part for each part of the file, and
  4. b2_finish_large_file.

The first step is to tell B2 that you're going to upload a large file by calling b2_start_large_file and providing the file name, content type, and custom file info. The call will return a file ID for the large file, which you must have when uploading the parts of the file.

Then, you need to decide what size each upload for each part. For example, if you're uploading a 100GB file, you could make each part 1GB, and upload 1GB a time. The maximum part size is 5GB, and the minimum part size is 5MB, except for the last part in a file, which has a minimum size of one byte. We recommend a part size of 100MB, which strikes a good balance between upload throughput and the ability to upload parts in parallel.

Rather than hard coding the part size, it is a good idea to use the recommendedPartSize return by b2_authorize_account.

The parts are numbered starting at 1, up to the number of parts needed, with a maximum of 10,000 parts for one large file.

Use b2_get_upload_part_url to get the target for uploading parts. Each thread doing uploading should get its own URL.

Upload each part using b2_upload_part and providing the file ID of the large file, the part number, and the data in the part.

Finally, once all of the parts are uploaded, you can call b2_finish_large_file to transform the parts into a single B2 file. Once this is done, it looks just like any other file. You can download it, and it will show up when you list the files in a bucket.

Managing Uploads

Any number of large file uploads can be in progress at once. You can use b2_list_unfinished_large_files to get a list of them.

For any one upload, you can use b2_list_parts to get a list of the parts that have been uploaded so far.

If you have started uploading a large file, but don't want to finish, you can use b2_cancel_large_file to delete all of the parts that have been uploaded so far.

Usage Charges for Large Files

In most ways, large files are treated the same as small files. The costs for the API calls are the same.

You will be charged for storage for parts that have been uploaded. Usage is counted from the time the upload is done. When you call b2_finish_large_file, the parts are combined into one big file, but the number of bytes stored remains the same, so it doesn't affect the charge for the storage.

SHA1 Checksums

Large files do not require a SHA1 checksum on the entire file but Backblaze recommends one. If the caller knows the SHA1 of the entire large file being uploaded, Backblaze recommends specifying the SHA1 in the fileInfo during the call to b2_start_large_file. Inside the fileInfo specify one of the keys as large_file_sha1 and for the value use a 40 byte hex string representing the SHA1.

For each part of the large file, a SHA1 is required. You must provide a SHA1 checksum when you upload it, which is used to validate the data sent, and which is stored internally with that part.

When you download a large file, or a range of a large file, there is no checksum for the entire file, so the string none is returned in the X-Bz-Content-Sha1 header.

Accessing Large Files

Once a file is uploaded, you can do anything you can do with a normal file:

When downloading large files, the Range header can be especially useful. It lets you download just part of the file. For details, see: b2_download_file_by_name and b2_download_file_by_id.

Limits

Large files can range in size from 5MB to 10TB.

Each large file must consist of at least 2 parts, and all of the parts except the last one must be at least 5MB in size. The last part must contain at least one byte.