Large Files
Large files can range from 5MB all the way up to 10TB, well beyond the 5GB limit on normal files.
The 5GB limit on normal files is there primarily because that's how much you can reliably upload with a single HTTP request in a reasonable amount of time. Large files are created by assembling parts, where each part can either be uploaded or copied from existing files in any bucket belonging to the same account as the large file. And just to be clear, that means a large file can be assembled from a mix of uploaded and copied parts.
Parts of a large file can be uploaded and copied in parallel,
which can significantly reduce the time it takes to upload terabytes of
data. Each part can be anywhere from 5MB to 5GB in size, and
you can pick the size that is most convenient for your application.
For best upload performance, we recommend using the recommendedPartSize
returned by b2_authorize_account
.
Creating Large Files
The steps to create a large file are:
b2_start_large_file
,b2_get_upload_part_url
for each thread uploading,-
b2_upload_part
ORb2_copy_part
for each part of the file, and b2_finish_large_file
.
The first step is to tell B2 that you're going to create a large file
by calling
b2_start_large_file
and providing the file name, content type, and custom file info. The call
will return a file ID for the large file, which you must have when
uploading the parts of the file.
Then, you need to decide where your parts are coming from. Each part can either be uploaded, or copied from an existing file in any bucket belonging to the same account as the large file. Note that large files can be assembled from a mix of uploaded and copied parts.
You need to determine the size of each part you upload and copy. For example, if you're uploading a 100GB file, you could make each part 1GB, and perform 100 uploads, 1GB at a time. Note that parts need not all be the same size. The maximum part size is 5GB, and the minimum part size is 5MB, except for the last part in a file, which has a minimum size of one byte. We recommend a part size of 100MB, which strikes a good balance between upload throughput and the ability to upload parts in parallel.
Rather than hard coding the part size, it is a good idea to use the recommendedPartSize
returned by b2_authorize_account
.
The parts are numbered starting at 1, up to the number of parts needed, with a maximum of 10,000 parts for one large file.
Use b2_get_upload_part_url
to get the target for uploading parts. Each thread doing uploading
should get its own URL.
Upload a part using
b2_upload_part
and
provide the file ID of the large file, the part number, and the
data in the part.
Copy a part using
b2_copy_part
and
provide the source file id, the large file id, the part number,
and optionally the range of bytes to copy over from the source file.
Finally, once all of the parts are uploaded, you can call
b2_finish_large_file
to transform the parts into a single B2 file. Once this is done,
it looks just like any other file. You can download it,
and it will show up when you list the files in a bucket.
Managing Large Files In Progress
Any number of large files can be in progress at once. You can
use b2_list_unfinished_large_files
to get a list of them.
For any one unfinished large file, you can use
b2_list_parts
to get a list
of the parts that have been uploaded so far.
If you have started a large file, but don't want to finish, you
can use b2_cancel_large_file
to delete all of the parts that have been uploaded so far.
Usage Charges for Large Files
In most ways, large files are treated the same as small files. The costs for the API calls are the same.
You will be charged for storage for parts that have been uploaded or copied.
Usage is counted from the time the part is stored. When you call
b2_finish_large_file
,
the parts are combined into one big file, but the number of bytes stored
remains the same, so it doesn't affect the charge for the storage.
SHA1 Checksums
Large files do not require a SHA1 checksum on the entire file but Backblaze recommends one.
If the caller knows the SHA1 of the entire large file being created,
Backblaze recommends specifying the SHA1 in the fileInfo during the call to
b2_start_large_file
.
Inside the fileInfo specify one of the keys as large_file_sha1
and for the value use a 40 byte hex string representing the SHA1.
For each part of the large file, a SHA1 is required. When uploading a part, you must provide a SHA1 checksum, which is used to validate the data sent, and which is stored internally with that part.
When you download a large file, or a range of a large file, there is no checksum
for the entire file, so the string none
is returned in the
X-Bz-Content-Sha1
header.
Accessing Large Files
Once a large file is created, you can do anything you can do with a normal file:
b2_delete_file_version
- deletes one version of one fileb2_download_file_by_id
- downloads a specific version of a fileb2_download_file_by_name
- downloads the most recent version of a fileb2_get_file_info
- returns information about a fileb2_hide_file
- hides a file, without deleting its datab2_list_file_names
- lists the file names in a bucketb2_list_file_versions
- lists all of the file versions in a bucket
When downloading large files, the Range
header can be
especially useful. It lets you download just part of the file. For
details, see:
b2_download_file_by_name
and
b2_download_file_by_id
.
Limits
Large files can range in size from 5MB to 10TB.
Each large file must consist of at least 2 parts, and all of the parts except the last one must be at least 5MB in size. The last part must contain at least one byte.