When data is central to your business, the stakes are high, and the anxiety is even higher during a cloud to cloud migration. One customer compared it to letting someone else carry your baby from one room to the next. What if there’s a pipeline failure? Data gets corrupted? Some files don’t transfer? These valid concerns, not to mention high egress fees, keep customers stuck with cloud providers even when they’d happily switch.
Controlling what-ifs was part of our inspiration for launching cloud to cloud migration, which aids customers like Complex Networks, Motion, and many others to successfully transfer hundreds of terabytes at speed.
Recently, a Silicon Valley climate-tech company approached us with a new what-if…
“What if we have block storage, and we want to migrate to object storage?”
Working with our migration partners, our teams set to the task of making block to object storage migration as easy and fast as any other migration. The process was an interesting challenge, so we wanted to share the technical learnings here.
The climate-tech company that approached us created a solution that uses AI to detect environmental hazards or changes rather than relying on bystanders or 911 calls. Their technology speeds emergency response time with a connected, intelligent platform that unifies data from ultra-high-definition cameras, satellite imagery, field sensors, and other sources. All of that data collection meant the company accumulated more than 300 million files, most of which were small images less than 189kB but added up to over 100TB in total. Their storage resided on Google Persistent Disk, and costs were escalating. They needed a way to move it easily and safely, but most importantly, at speed.
How to Migrate Block Storage to Object Storage
All of the cloud migrations we’ve done to date have focused on migrating object storage in another cloud to Backblaze B2 Cloud Storage, which is also object-based storage. We offer a few methods for doing so—via internet transfers with multi-threading or our integrated cloud to cloud data transfer partner Flexify.IO, a solution many of our customers choose.
Object storage to object storage migrations allow Flexify to deploy their migration engines outside of the customer’s environment. Flexify only needs to access customer data via application keys that the customer enters within their Flexify account. Since object storage can be accessed via an API over HTTP protocol, it makes for a hands-off approach that doesn’t call for any software deployment on the part of the customer.
But, what happens when that data is not on object storage, but rather block storage like AWS EBS or Google Persistent Disk?
Object storage: A data storage method where each “object” contains all of the bytes that constitute a file, be it an image, video, document, etc., as well as any associated metadata. These objects are kept in a single, flat repository and assigned a unique identifier so they can be called when needed.
Block storage: A data storage method where the bytes that constitute a file are broken up across “blocks” that can be physically distributed across machines or repositories to maximize efficient storage space utilization. In this method, each block is assigned a unique identifier so any file software or operating system can reconstitute the bytes when a file is needed.
Want the nitty-gritty details? Read more about the difference between storage methods.
Like an external hard drive that needs to be plugged in to your computer to get the data out, block storage must be attached to a host operating system in order to access its data. It can’t be accessed by API, and therein lies our challenge—we had to find a way to translate block storage to object storage while minimizing the customer’s need to be heavily involved from an accessibility standpoint.
Apples to Oranges: Translating Block Storage to Object Storage
So, how do you migrate 300 million files from block storage to a different storage type?
You could use a tool like rclone if you want to handle a block storage migration yourself to Backblaze B2 Cloud Storage. For smaller data sets, this approach works well. As the amount of data you need to move increases, using a command line tool like rclone becomes more cumbersome, including:
- Figuring out the right settings to use.
- Understanding how to network your cloud account to tap into the high-speed networks Backblaze has with other clouds.
- Using a private network rather than the public internet in order to benefit from lower egress fees.
Managing the transfer themselves with that much data was out of the question for the customer, so we turned to another integration partner, MinIO, for an assist. MinIO, the Kubernetes object storage developer, is an open-source application that acts as an object storage server. The benefit for us is that MinIO can run on any operating system and can easily translate files to objects. It presents the mounted block storage as if the data were stored in an object cloud. Once presented that way, any application, including Flexify, can access the data using Backblaze S3 Compatible APIs.
With MinIO configured in front of the climate-tech company’s block storage, Flexify could access it, but we needed to be able to list all of the objects before Flexify could start migrating them.
Extracting Metadata From Block Storage
To list the objects, a critical piece of the puzzle was missing—the metadata. Solving for this challenge involved extracting the metadata out of a file in block storage. To do so, MinIO performs a stat() call to the backend block storage system.
With 300 million files, this then posed a speed problem. Even though copying the data to Backblaze B2 took only seconds once it was listed, every page file of 4,500 objects took MinIO two minutes to list. Processing that much data at that speed could take months—we had to find a way to speed it up.
Troubleshooting Performance Issues in Storage Translation
To improve performance and accelerate the translation process, we took a few different tacks, learning more about the problem along the way.
First, we started at the lowest level—the OS. We tried adding more memory and processors to the host compute engine. It should have increased performance, but it had no impact. We deduced the issue wasn’t with CPU or memory because we were using the recommended settings for the virtual machine running MinIO—4x vCPUs and 16GB RAM. The issue had to be something else.
Second, after researching what else could cause the slowdown, we learned that rotational media like HDD-based RAID or Google Cloud’s Persistent Disk can suffer from high latency. Response times might be as high as 20 milliseconds, limiting sequential I/O to around 50 IOPS. Limited sequential IOPS means longer times to read and write to storage.
Third, we decided to look further up in MinIO’s application layer. We thought perhaps the load on MinIO was too much, so we split the load across two MinIO servers. Strike three—this didn’t help either. Listing the objects was still taking too long and impeding on our overall throughput.
Finally, after we eliminated the other potential bottlenecks, we looked at the underlying calls MinIO was making. We uncovered that MinIO was not taking advantage of parallelism when listing objects. Each object required a separate stat() system call. On high-latency rotational media, MinIO would not be able to list more than approximately 50 objects per second. This would limit the total throughput of the migration, especially when objects are relatively small in size.
Solving Performance Issues for Fast Block to Object Transfers
The Flexify team dug into the MinIO code and, after engaging MinIO for guidance, realized disabling the sequential stat() calls would allow us to vastly increase the throughput. Deploying a special build of MinIO that did not make the stat() calls allowed us to achieve 2 Gbit/s sustained, cutting the overall migration timeline down from weeks or months to just five to seven days.
Enabling Cloud to Cloud Migration Between Storage Types
Throughout the collaboration, the lessons our Partner team learned allowed us to build a better foundation for future data migrations between storage types, turning another what-if into a “why not?”
We love interesting use cases as much as we love carrying data carefully from one room to the next. Let us know if you have a what-if we can solve, and check out our cloud to cloud migration offer and partners—we’ll pay for your data transfer if you need to move more than 10TB out of Amazon.