Go Wild with Wildcards in the Backblaze B2 Command Line Tool 3.7.1

File transfer tools such as Cyberduck, FileZilla Pro, and Transmit implement a graphical user interface (GUI), which allows users to manage and transfer files across local storage and any number of services, including cloud object stores such as Backblaze B2 Cloud Storage. Some tasks, however, require a little more power and flexibility than a GUI can provide. This is where a command line interface (CLI) shines. A CLI typically provides finer control over operations than a GUI tool, and makes it straightforward to automate repetitive tasks. We recently released version 3.7.0 (and then, shortly thereafter, version 3.7.1) of the Backblaze B2 Command Line Tool, alongside version 1.19.0 of the underlying Backblaze B2 Python SDK. Let’s take a look at the highlights in the new releases, and why you might want to use the Backblaze B2 CLI rather than the AWS equivalent.

Battle of the CLI’s: Backblaze B2 vs. AWS

As you almost certainly already know, Backblaze B2 has an S3-compatible API in addition to its original API, now known as the B2 Native API. In most cases, we recommend using the S3-compatible API, since a rich ecosystem of S3 tools and knowledge has evolved over the years.

While the AWS CLI works perfectly well with Backblaze B2, and we explain how to use it in our B2 Developer Quick-Start Guide, it’s slightly clunky. The AWS CLI allows you to set your access key id and secret access key via either environment variables or a configuration file, but you must override the default endpoint on the command line with every command, like this:

% aws --endpoint-url https://s3.us-west-004.backblazeb2.com s3api \
list-buckets

This is very tiresome if you’re working interactively at the command line! In contrast, the B2 CLI retrieves the correct endpoint from Backblaze B2 when it authenticates, so the command line is much more concise:

% b2 list-buckets

Additionally, the CLI provides fine-grain access to Backblaze B2-specific functionality, such as application key management and replication.

Automating Common Tasks with the B2 Command Line Tool

If you’re already familiar with CLI tools, feel free to skip to the next section.

Imagine you’ve uploaded a large number of WAV files to a Backblaze B2 Bucket for transcoding into .mp3 format. Once the transcoding is complete, and you’ve reviewed a sample of the .mp3 files, you decide that you can delete the .wav files. You can do this in a GUI tool, opening the bucket, navigating to the correct location, sorting the files by extension, selecting all of the .wav files, and deleting them. However, the CLI can do this in a single command:

% b2 rm --withWildcard --recursive my-bucket 'audio/*.wav'

If you want to be sure you’re deleting the correct files, you can add the --dryRun option to show the files that would be deleted, rather than actually deleting them:

% b2 rm --dryRun --withWildcard --recursive my-bucket 'audio/*.wav'
audio/aardvark.wav
audio/barracuda.wav
...
audio/yak.wav
audio/zebra.wav

You can find a complete list of the CLI’s commands and their options in the documentation.

Let’s take a look at what’s new in the latest release of the Backblaze B2 CLI.

Major Changes in B2 Command Line Tool Version 3.7.0

New rm command

The most significant addition in 3.7.0 is a whole new command: rm. As you might expect, rm removes files. The CLI has always included the low-level delete-file-version command (to delete a single file version) but you had to call that multiple times and combine it with other commands to remove all versions of a file, or to remove all files with a given prefix.

The new rm command is significantly more powerful, allowing you to delete all versions of a file in a single command:

% b2 rm --versions --withWildcard --recursive my-bucket \
images/san-mateo.png

Let’s unpack that command:

  • %: represents the command shell’s prompt. (You don’t type this.)
  • b2: the B2 CLI executable.
  • rm: the command we’re running.
  • --versions: apply the command to all versions. Omitting this option applies the command to just the most recent version.
  • --withWildcard: treat the folderName argument as a pattern to match the file name.
  • --recursive: descend into all folders. (This is required with --withWildcard.)
  • my-bucket: the bucket name.
  • images/san-mateo.png: the file to be deleted. There are no wildcard characters in the pattern, so the file name must match exactly. Note: there is no leading ‘/’ in Backblaze B2 file names.

As mentioned above, the --dryRun argument allows you to see what files would be deleted, without actually deleting them. Here it is with the ‘*’ wildcard to apply the command to all versions of the .png files in /images. Note the use of quotes to avoid the command shell expanding the wildcard:

% b2 rm --dryRun --versions --withWildcard --recursive my-bucket \
'images/*.png'
images/amsterdam.png
images/sacramento.png
DANGER ZONE: by omitting --withWildcard and the folderName argument, you can delete all of the files in a bucket. We strongly recommend you use --dryRun first, to check that you will be deleting the correct files.
% b2 rm --dryRun --versions –recursive my-bucket
index.html
images/amsterdam.png
images/phoenix.jpeg
images/sacramento.png
stylesheets/style.css

New --withWildcard option for the ls command

The ls command gains the --withWildcard option. It operates identically as described above. In fact, b2 rm --dryRun --withWildcard --recursive executes the exact same code as b2 ls --withWildcard --recursive. For example:

% b2 ls --withWildcard --recursive my-bucket 'images/*.png'
images/amsterdam.png
images/sacramento.png

You can combine --withWildcard with any of the existing options for ls, for example --long:

% b2 ls --long --withWildcard --recursive my-bucket 'images/*.png'
4_z71d55dummyid381234ed0c1b_f108f1dummyid163b_d2dummyid_m165048_c004
_v0402014_t0016_u01dummyid48198  upload  2023-02-09  16:50:48     714686  
images/amsterdam.png
4_z71d55dummyid381234ed0c1b_f1149bdummyid1141_d2dummyid_m165048_c004
_v0402010_t0048_u01dummyid48908  upload  2023-02-09  16:50:48     549261  
images/sacramento.png

New --incrementalMode option for upload-file and sync

The new --incrementalMode option saves time and bandwidth when working with files that grow over time, such as log files, by only uploading the changes since the last upload. When you use the --incrementalMode option with upload-file or sync, the B2 CLI looks for an existing file in the bucket with the b2FileName that you supplied, and notes both its length and SHA-1 digest. Let’s call that length l. The CLI then calculates the SHA-1 digest of the first l bytes of the local file. If the digests match, then the CLI can instruct Backblaze B2 to create a new file comprising the existing file and the remaining bytes of the local file.

That was a bit complicated, so let’s look at a concrete example. My web server appends log data to a file, access.log. I’ll see how big it is, get its SHA-1 digest, and upload it to a B2 Bucket:

% ls -l access.log
-rw-r--r--  1 ppatterson  staff  5525849 Feb  9 15:55 access.log

% sha1sum access.log
ff46904e56c7f9083a4074ea3d92f9be2186bc2b  access.log 

The upload-file command outputs all of the file’s metadata, but we’ll focus on the SHA-1 digest, file info, and size.

% b2 upload-file my-bucket access.log access.log
...
{
...
    "contentSha1": "ff46904e56c7f9083a4074ea3d92f9be2186bc2b",
...
    "fileInfo": {
        "src_last_modified_millis": "1675986940381"
    },
...
    "size": 5525849,
...
}

As you might expect, the digest and size match those of the local file.

Time passes, and our log file grows. I’ll first upload it as a different file, so that we can see the default behavior when the B2 Cloud Storage file is simply replaced:

% ls -l access.log
-rw-r--r--  1 ppatterson  staff  11047145 Feb  9 15:57 access.log

% sha1sum access.log
7c97866ff59330b67aa96d7a481578d62e030788 access.log

% b2 upload-file my-bucket access.log new-access.log
{
...
    "contentSha1": "7c97866ff59330b67aa96d7a481578d62e030788",
...
    "fileInfo": {
        "src_last_modified_millis": "1675987069538"
    },
...
    "size": 11047145,
...
}

Everything is as we might expect—the CLI uploaded 11,047,145 bytes to create a new file, which is 5,521,296 bytes bigger than the initial upload.

Now I’ll use the --incrementalMode option to replace the first Backblaze B2 file:

% b2 upload-file --quiet my-bucket access.log access.log
...
{
...
    "contentSha1": "none",
...
    "fileInfo": {
        "large_file_sha1": "7c97866ff59330b67aa96d7a481578d62e030788",
        "plan_id": "ea6b099b48e7eb7fce01aba18dbfdd72b56eb0c2",
        "src_last_modified_millis": "1675987069538"
    },
...
    "size": 11047145,
...
}

The digest is exactly the same, but it has moved from contentSha1 to fileInfo.large_file_sha1, indicating that the file was uploaded as separate parts, resulting in a large file. The CLI didn’t need to upload the initial 5,525,849 bytes of the local file; it instead instructed Backblaze B2 to combine the existing file with the final 5,521,296 bytes of the local file to create a new version of the file.

There are several more new features and fixes to existing functionality in version 3.7.0—make sure to check out the B2 CLI changelog for a complete list.

Major Changes in B2 Python SDK 1.19.0

Most of the changes in the B2 Python SDK support the new features in the B2 CLI, such as adding wildcard matching to the Bucket.ls operation and adding support for incremental upload and sync. Again, you can inspect the B2 Python SDK changelog for a comprehensive list.

Get to Grips with B2 Command Line Tool Version 3.7.0 3.7.1

Whether you’re working on Windows, Mac or Linux, it’s straightforward to install or update the B2 CLI; full instructions are provided in the Backblaze B2 documentation.

Note that the latest version is now 3.7.1. The only changes from 3.7.0 are a handful of corrections to help text and that the Mac binary is no longer provided, due to shortcomings in the Mac version of PyInstaller. Instead, we provide the Mac version of the CLI via the Homebrew package manager.

print

About Pat Patterson

Pat Patterson is the chief technical evangelist at Backblaze. Over his three decades in the industry, Pat has built software and communities at Sun Microsystems, Salesforce, StreamSets, and Citrix. In his role at Backblaze, he creates and delivers content tailored to the needs of the hands-on technical professional, acts as the “voice of the developer” on the Product team, and actively participates in the wider technical community. Outside the office, Pat runs far, having completed ultramarathons up to the 50 mile distance. Catch up with Pat via Twitter or LinkedIn.