Rclone is an open source, command line tool for file management, and it’s widely used to copy data between local storage and an array of cloud storage services, including Backblaze B2 Cloud Storage. Rclone has had a long association with Backblaze—support for Backblaze B2 was added back in January 2016, just two months before we opened Backblaze B2’s public beta, and five months before the official launch—and it’s become an indispensable tool for many Backblaze B2 customers. If you want to explore the solution further, check out the Integration Guide.
UPDATE: rclone v1.64.1 was released on October 17, 2023, fixing the bug that caused some multipart uploads to fail with a “sizes differ” error, as well as a couple of other bugs that crept in during the rewrite of rclone’s B2 integration in v1.64.0. Our testing confirms that rclone v1.64.1 delivers the same performance improvements as v1.64.0 while fixing the issues we encountered with that version. Go ahead and upgrade rclone to v1.64.1 (or whatever the current version is when you read this)!
Does it deliver? Should you upgrade? Read on to find out!
Multithreading to Boost File Transfer Performance
Something of a Swiss Army Knife for cloud storage, rclone can copy files, synchronize directories, and even mount remote storage as a local filesystem. Previous versions of rclone were able to take advantage of multithreading to accelerate the transfer of “large” files (by default at least 256MB), but the benefits were limited.
When transferring files from a storage system to Backblaze B2, rclone would read chunks of the file into memory in a single reader thread, starting a set of multiple writer threads to simultaneously write those chunks to Backblaze B2. When the source storage was a local disk (the common case) as opposed to remote storage such as Backblaze B2, this worked really well—the operation of moving files from local disk to Backblaze B2 was quite fast. However, when the source was another remote storage—say, transferring from Amazon S3 to Backblaze B2, or even Backblaze B2 to Backblaze B2—data chunks were read into memory by that single reader thread at about the same rate as they could be written to the destination, meaning that all but one of the writer threads were idle.
What’s the Big Deal About Rclone v1.64.0?
Rclone v1.64.0 completely refactors multithreaded transfers. Now rclone starts a single set of threads, each of which both reads a chunk of data from the source service into memory, and then writes that chunk to the destination service, iterating through a subset of chunks until the transfer is complete. The threads transfer their chunks of data in parallel, and each transfer is independent of the others. This architecture is both simpler and much, much faster.
Show Me the Numbers!
How much faster? I spun up a virtual machine (VM) via our compute partner, Vultr, and downloaded both rclone v1.64.0 and the preceding version, v1.63.1. As a quick test, I used Rclone’s
copyto command to copy 1GB and 10GB files from Amazon S3 to Backblaze B2, like this:
rclone --no-check-dest copyto s3remote:my-s3-bucket/1gigabyte-test-file b2remote:my-b2-bucket/1gigabyte-test-file
Note that I made no attempt to “tune” rclone for my environment by setting the chunk size or number of threads. I was interested in the out of the box performance. I used the
--no-check-dest flag so that rclone would overwrite the destination file each time, rather than detecting that the files were the same and skipping the copy.
I ran each
copyto operation three times, then calculated the average time. Here are the results; all times are in seconds:
As you can see, the difference is significant! The new rclone transferred both files around three times faster than the previous version.
So, copying individual large files is much faster with the latest version of rclone. How about migrating a whole bucket containing a variety of file sizes from Amazon S3 to Backblaze B2, which is a more typical operation for a new Backblaze customer? I used rclone’s
copy command to transfer the contents of an Amazon S3 bucket—2.8GB of data, comprising 35 files ranging in size from 990 bytes to 412MB—to a Backblaze B2 Bucket:
rclone --fast-list --no-check-dest copyto s3remote:my-s3-bucket b2remote:my-b2-bucket
Much to my dismay, this command failed, returning errors related to the files being corrupted in transfer, for example:
2023/09/18 16:00:37 ERROR : tpcds-benchmark/catalog_sales/20221122_161347_00795_djagr_3a042953-d0a2-4b8d-8c4e-6a88df245253: corrupted on transfer: sizes differ 244695498 vs 0
Rclone was reporting that the transferred files in the destination bucket contained zero bytes, and deleting them to avoid the use of corrupt data.
After some investigation, I discovered that the files were actually being transferred successfully, but a bug in rclone 1.64.0 (now fixed—see below!) caused the app to incorrectly interpret some successful transfers as corrupted, and thus delete the transferred file from the destination.
I was able to use the
--ignore-size flag to workaround the bug by disabling the file size check so I could continue with my testing:
rclone --fast-list --no-check-dest --ignore-size copyto s3remote:my-s3-bucket b2remote:my-b2-bucket
A Word of Caution to Control Your Transaction Fees
Note the use of the
--fast-list flag. By default, rclone’s method of reading the contents of cloud storage buckets minimizes memory usage at the expense of making a “list files” call for every subdirectory being processed. Backblaze B2’s list files API,
b2_list_file_names, is a class C transaction, priced at $0.004 per 1,000 with 2,500 free per day. This doesn’t sound like a lot of money, but using rclone with large file hierarchies can generate a huge number of transactions. Backblaze B2 customers have either hit their configured caps or incurred significant transaction charges on their account when using rclone without the
We recommend you always use
--fast-list with rclone if at all possible. You can set an environment variable so you don’t have to include the flag in every command:
Again, I performed the copy operation three times, and averaged the results:
|Rclone version||2.8GB tree|
Since the bucket contains both large and small files, we see a lesser, but still significant, improvement in performance with rclone v1.64.0—it’s about 33% faster than the previous version with this set of files.
So, Should I Upgrade to the Latest Rclone?
As outlined above, rclone v1.64.0 contains a bug that can cause copy (and presumably also
sync) operations to fail. If you want to upgrade to v1.64.0 now, you’ll have to use the
--ignore-size workaround. If you don’t want to use the workaround, it’s probably best to hold off until rclone releases v1.64.1, when the bug fix will likely be deployed—I’ll come back and update this blog entry when I’ve tested it!
Rclone v1.64.1, released on October 17, 2023, delivers the performance improvements described above and includes a fix for the bug that caused some multipart uploads to fail with a “sizes differ” error. We’ve tested that it both delivers on the promised performance improvements and successfully copies both individual files and buckets of files. We now recommend that you upgrade rclone to v1.64.1 (or the current version as you read this)—you can find it on the rclone download page.