Graylark

Frank Wolters, CTO, Graylark
Faster Transfers
Cost Savings
Downloaded in Hours
Graylark is building a frontier visual intelligence platform powered by a large geospatial model (LGM). Training highly specialized AI models requires ingesting and managing petabytes of spatial data. As data volumes rapidly expanded, traditional cloud storage became increasingly costly and slow—threatening both model development speed and startup runway.
Graylark adopted Backblaze B2 as a scalable, cost-effective repository for model training datasets. Using rclone and optimized batch folder structures, the team migrated and restructured billions of files into segmented datasets. Backblaze’s S3-compatible API and high-throughput performance enabled fast parallel downloads directly to GPU training environments.
With Backblaze, Graylark reduced storage costs by an estimated 3–8x compared to hyperscale alternatives while dramatically improving data transfer speeds. What once took days now takes hours allowing same-day model training. The savings extended operational runway, while reliable performance ensured uninterrupted development of mission-critical AI systems.
Graylark’s platform operates across a hybrid cloud environment. User-uploaded images enter a traditional cloud production environment. Meanwhile, large-scale training datasets—millions to billions of spatially indexed images—are stored in Backblaze B2.
Using rclone with parallelized transfers and optimized folder segmentation (hundreds of thousands of objects per batch), Graylark can download up to 20TB of data within hours to GPU training instances. This architecture enables rapid experimentation, retraining, and deployment of highly specialized geospatial models without bottlenecks.

Graylark is an AI company building a frontline visual intelligence platform. Its proprietary large geospatial model (LGM) analyzes imagery to identify precise locations and contextual intelligence in seconds.
In just months, Graylark scaled from early prototype datasets to over a petabyte of data—with expectations to exceed three petabytes. To support this growth, they needed storage that could:
Backblaze provided predictable pricing without minimum duration penalties.
Especially in AI, you don’t throw data away. Backblaze lets us keep what we need without worrying about runaway storage costs.
Frank Wolters, CTO, Graylark
Previously, downloading large datasets from traditional cloud storage could take 12–16 hours—or longer depending on structure. With Backblaze:
The result is that model training can begin the same day a new initiative starts, accelerating R&D cycles dramatically.
There’s nothing worse than starting an AI initiative and waiting days just to download data. Now we can move terabytes within hours.
Frank Wolters, CTO, Graylark
As a venture-backed startup, infrastructure efficiency directly impacts runway. Graylark evaluated multiple providers, including hyperscalers and alternative object storage platforms, and estimates they would be paying 3–8x more with alternative providers for similar storage volumes. Backblaze enables them to preserve capital, allocate resources toward model development, and scale data aggressively without financial hesitation. They also appreciate the stable performance and seamless interaction into their existing workflows. For an AI company dependent on vast datasets, continuity is non-negotiable and cost predictability is a strategic edge.
We’ve saved significant cost. It’s extended our runway greatly—and that’s huge for a startup at our stage.
Frank Wolters, CTO, Graylark