A bucket holds files. There is no hierarchy of folders; just one flat list of file names. For example, a bucket might have four files in it, with the following names:
The name of the first file is
cats/cat.jpg, and the name of the second file is
cats/kitten.jpg. There is no folder for
Even though there are no folders, many of the tools that work with files in a bucket act like there are folders. The file browser on the Backblaze web site acts like there are folders, and so does the Backblaze B2 command-line tool. Under the covers, both tools scan through the flat list of files and simulate folders. The following example shows a bucket that uses the command-line tool:
$ b2 ls my_bucket cats/ dogs/ $ b2 ls my_bucket cats cat.jpg kitten.jpg
Backblaze recommends that you use "/" to separate folder names, just like you would for files on your computer (or "\" for Windows). That way, the tools that you use can determine the implied folder structure.
Each file has information associated with it, in addition to the sequence of bytes that the file contains. Every file has a size (the number of bytes in the file), a MIME type, and a SHA1 checksum. You can also add your own custom information.
When you upload a file to Backblaze B2, you should provide a SHA-1 hash of the contents of the file in the HTTP request header. The SHA-1 ensures that the file content that you upload matches the file content that is persisted in Backblaze B2 and that if the data is corrupted in the network on its way to Backblaze B2, it will be detected before the file is stored. When you download a file, the SHA1 checksum is attached so that you can verify that the data you receive is intact.
Additionally, the SHA-1 is then saved for the future. If you request the file for download, the original SHA-1 is matched with the file that is reassembled from the Backblaze Vault. If they do not match, the file is recreated.
While adding the hash is optional, Backblaze strongly recommends that you provide the SHA-1 to absolutely guarantee that the file you are uploading matches what is saved in Backblaze B2. In some use cases such as video, audio, and photographs, if a bit is flipped, it is not fatal. However, if you stream an encrypted file from one provider to another and one bit flips on this file, its data is meaningless.
In practice, bits do flip randomly in TCP/IP transmissions. The reasons are summarized in this article. TCP/IP packets regularly fail checksum tests, and the 16-bit CRC checksum is not sufficient to find every corruption.
If you do not provide the SHA-1, you should still compute it before or during the upload and then check the upload response to ensure that it matches. If it does not match, delete what you uploaded and retry the upload. The SHA-1 that is provided in the response is what Backblaze B2 uses to protect against bit flips.
You can add the SHA-1 at the end of the request.
For more information, see Upload Files with the Native API.
When you upload a file, you also provide a MIME type for the file, which is used when a browser downloads the file so that it knows what kind of file it is. For example, if you specify that your file
kitten.jpg has a MIME type of
image/jpeg, then a browser that downloads the file knows that it is an image to be displayed.
HTTP Header Size Limit
The file name and file information must fit, along with the other necessary headers, within an 8 KB limit that is imposed by some web servers and proxies. To ensure this, both now and in the future, Backblaze B2 limits the combined header size for all of the file information. There are two possible limits depending on the features in use for a file.
- In most cases, Backblaze B2 limits the combined header size for the file name and all file information to 7,000 bytes. This limit applies to the fully encoded HTTP header line, including the carriage-return and newline. The header line below is counted as 40 bytes.
- Newer features of the Backblaze B2 API require additional headers. For files that are encrypted with server-side encryption or files that are in Object Lock-enabled buckets, the limit is reduced to 2,048 bytes to ensure sufficient space for additional response headers. This limit is only on the file information header names and values. The header line below is counted as 36 bytes.