Large Files

Large files can range from 5MB all the way up to 10TB, well beyond the 5GB limit on normal files.

The 5GB limit on normal files is there primarily because that's how much you can reliably upload with a single HTTP request in a reasonable amount of time. Large files are created by assembling parts, where each part can either be uploaded or copied from existing files in any bucket belonging to the same account as the large file. And just to be clear, that means a large file can be assembled from a mix of uploaded and copied parts.

Parts of a large file can be uploaded and copied in parallel, which can significantly reduce the time it takes to upload terabytes of data. Each part can be anywhere from 5MB to 5GB in size, and you can pick the size that is most convenient for your application. For best upload performance, we recommend using the recommendedPartSize returned by b2_authorize_account.

Creating Large Files

The steps to create a large file are:

  1. b2_start_large_file,
  2. b2_get_upload_part_url for each thread uploading,
  3. b2_upload_part OR b2_copy_part for each part of the file, and
  4. b2_finish_large_file.

The first step is to tell B2 that you're going to create a large file by calling b2_start_large_file and providing the file name, content type, and custom file info. The call will return a file ID for the large file, which you must have when uploading the parts of the file.

Then, you need to decide where your parts are coming from. Each part can either be uploaded, or copied from an existing file in any bucket belonging to the same account as the large file. Note that large files can be assembled from a mix of uploaded and copied parts.

You need to determine the size of each part you upload and copy. For example, if you're uploading a 100GB file, you could make each part 1GB, and perform 100 uploads, 1GB at a time. Note that parts need not all be the same size. The maximum part size is 5GB, and the minimum part size is 5MB, except for the last part in a file, which has a minimum size of one byte. We recommend a part size of 100MB, which strikes a good balance between upload throughput and the ability to upload parts in parallel.

Rather than hard coding the part size, it is a good idea to use the recommendedPartSize returned by b2_authorize_account.

The parts are numbered starting at 1, up to the number of parts needed, with a maximum of 10,000 parts for one large file.

Use b2_get_upload_part_url to get the target for uploading parts. Each thread doing uploading should get its own URL.

Upload a part using b2_upload_part and provide the file ID of the large file, the part number, and the data in the part.

Copy a part using b2_copy_part and provide the source file id, the large file id, the part number, and optionally the range of bytes to copy over from the source file.

Finally, once all of the parts are uploaded, you can call b2_finish_large_file to transform the parts into a single B2 file. Once this is done, it looks just like any other file. You can download it, and it will show up when you list the files in a bucket.

Managing Large Files In Progress

Any number of large files can be in progress at once. You can use b2_list_unfinished_large_files to get a list of them.

For any one unfinished large file, you can use b2_list_parts to get a list of the parts that have been uploaded so far.

If you have started a large file, but don't want to finish, you can use b2_cancel_large_file to delete all of the parts that have been uploaded so far.

Usage Charges for Large Files

In most ways, large files are treated the same as small files. The costs for the API calls are the same.

You will be charged for storage for parts that have been uploaded or copied. Usage is counted from the time the part is stored. When you call b2_finish_large_file, the parts are combined into one big file, but the number of bytes stored remains the same, so it doesn't affect the charge for the storage.

SHA1 Checksums

Large files do not require a SHA1 checksum on the entire file but Backblaze recommends one. If the caller knows the SHA1 of the entire large file being created, Backblaze recommends specifying the SHA1 in the fileInfo during the call to b2_start_large_file. Inside the fileInfo specify one of the keys as large_file_sha1 and for the value use a 40 byte hex string representing the SHA1.

For each part of the large file, a SHA1 is required. When uploading a part, you must provide a SHA1 checksum, which is used to validate the data sent, and which is stored internally with that part.

When you download a large file, or a range of a large file, there is no checksum for the entire file, so the string none is returned in the X-Bz-Content-Sha1 header.

Accessing Large Files

Once a large file is created, you can do anything you can do with a normal file:

When downloading large files, the Range header can be especially useful. It lets you download just part of the file. For details, see: b2_download_file_by_name and b2_download_file_by_id.

Limits

Large files can range in size from 5MB to 10TB.

Each large file must consist of at least 2 parts, and all of the parts except the last one must be at least 5MB in size. The last part must contain at least one byte.