Graylark

Powering Frontline Visual Intelligence with Scalable Cloud Storage

Use Cases
Industry
Integrations
No items found.
Features
No items found.

We’ve been able to get 20TB on a machine within hours. Before, that would take days or even weeks. Backblaze has saved us significant cost, extended our runway, and made it feasible to train models the same day

Frank Wolters, CTO, Graylark

10x

Faster Transfers

8x

Cost Savings

20TB

Downloaded in Hours

Situation

Graylark is building a frontier visual intelligence platform powered by a large geospatial model (LGM). Training highly specialized AI models requires ingesting and managing petabytes of spatial data. As data volumes rapidly expanded, traditional cloud storage became increasingly costly and slow—threatening both model development speed and startup runway.

Solution

Graylark adopted Backblaze B2 as a scalable, cost-effective repository for model training datasets. Using rclone and optimized batch folder structures, the team migrated and restructured billions of files into segmented datasets. Backblaze’s S3-compatible API and high-throughput performance enabled fast parallel downloads directly to GPU training environments.

Result

With Backblaze, Graylark reduced storage costs by an estimated 3–8x compared to hyperscale alternatives while dramatically improving data transfer speeds. What once took days now takes hours allowing same-day model training. The savings extended operational runway, while reliable performance ensured uninterrupted development of mission-critical AI systems.

How It Works

Graylark’s platform operates across a hybrid cloud environment. User-uploaded images enter a traditional cloud production environment. Meanwhile, large-scale training datasets—millions to billions of spatially indexed images—are stored in Backblaze B2.

Using rclone with parallelized transfers and optimized folder segmentation (hundreds of thousands of objects per batch), Graylark can download up to 20TB of data within hours to GPU training instances. This architecture enables rapid experimentation, retraining, and deployment of highly specialized geospatial models without bottlenecks.

Share This Case Study

Download Case Study

Graylark is an AI company building a frontline visual intelligence platform. Its proprietary large geospatial model (LGM) analyzes imagery to identify precise locations and contextual intelligence in seconds.

  • Founded: 2022
  • Industries served: Law enforcement and government agencies
  • Data in B2: Scaling toward 3PB+
Company bio image

Scaling to petabytes

In just months, Graylark scaled from early prototype datasets to over a petabyte of data—with expectations to exceed three petabytes. To support this growth, they needed storage that could:

  • Economically retain cold training data
  • Handle massive object counts
  • Scale without punitive egress or retention fees

Backblaze provided predictable pricing without minimum duration penalties.

Especially in AI, you don’t throw data away. Backblaze lets us keep what we need without worrying about runaway storage costs.

Frank Wolters, CTO, Graylark

No items found.

From days to hours

Previously, downloading large datasets from traditional cloud storage could take 12–16 hours—or longer depending on structure. With Backblaze:

  • 20TB can be downloaded in 2-3 hours
  • Parallelized transfers reduce bottlenecks
  • Proper segmentation prevents indexing slowdowns

The result is that model training can begin the same day a new initiative starts, accelerating R&D cycles dramatically.

There’s nothing worse than starting an AI initiative and waiting days just to download data. Now we can move terabytes within hours.

Frank Wolters, CTO, Graylark

Extended startup runway

As a venture-backed startup, infrastructure efficiency directly impacts runway. Graylark evaluated multiple providers, including hyperscalers and alternative object storage platforms, and estimates they would be paying 3–8x more with alternative providers for similar storage volumes. Backblaze enables them to preserve capital, allocate resources toward model development, and scale data aggressively without financial hesitation. They also appreciate the stable performance and seamless interaction into their existing workflows. For an AI company dependent on vast datasets, continuity is non-negotiable and cost predictability is a strategic edge.

We’ve saved significant cost. It’s extended our runway greatly—and that’s huge for a startup at our stage.

Frank Wolters, CTO, Graylark

Related Case Studies

A Publicly Traded Company (BLZE)
Backblaze © 2024

Staging secure is temporarily unavailable. Please check for any ongoing deploys. If none are in progress, contact the fullstack team for assistance. Click me to dismiss.