{"id":112739,"date":"2026-01-21T09:53:01","date_gmt":"2026-01-21T17:53:01","guid":{"rendered":"https:\/\/www.backblaze.com\/blog\/?p=112739"},"modified":"2026-01-21T09:53:02","modified_gmt":"2026-01-21T17:53:02","slug":"a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable","status":"publish","type":"post","link":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/","title":{"rendered":"A Developer&#8217;s Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1440\" height=\"820\" src=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png\" alt=\"A decorative image showing gears and a cloud. \" class=\"wp-image-112740\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png 1440w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1-300x171.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1-1024x583.png 1024w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1-768x437.png 768w\" sizes=\"auto, (max-width: 1440px) 100vw, 1440px\" \/><\/figure>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Today&#8217;s AI models consume much more than text\u2014everything from product images to video from surveillance feeds to audio from customer calls to metadata spread across an ever-expanding set of systems. These multimodal datasets drive everything from computer vision pipelines to customer service automation. But as they scale, the underlying infrastructure starts to creak.<\/p>\n\n\n\n<p>Costs can become unpredictable. Data fragments across S3 buckets, HDFS clusters, and local drives. Maintaining cross-modal alignment, i.e. ensuring that media files stay linked to their labels, embeddings, and annotations, becomes a bottleneck that slows development to a crawl.This article outlines a practical path forward: how to migrate multimodal training data using proven open-source tools, and how<a href=\"https:\/\/www.pixeltable.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Pixeltable<\/a> helps unify and index that data for training once it lands in <a href=\"https:\/\/www.backblaze.com\/blog\/building-multimodal-ai-data-infrastructure-with-pixeltable\/\" target=\"_blank\" rel=\"noreferrer noopener\">Backblaze B2.<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Moving multimodal training data: Practical open source software (OSS) tools that do the heavy lifting<\/h2>\n\n\n\n<p>Before you can train on consolidated data, you need to get it all into one place. These three open-source tools handle the migration work, each addressing a different piece of the puzzle.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Apache NiFi for moving large media reliably<\/h3>\n\n\n\n<p>When your dataset includes terabytes of video files, thousands of high-resolution images, or large binary assets like LIDAR scans, you need something more robust than a shell script.<a href=\"https:\/\/nifi.apache.org\/\"> Apache NiFi<\/a> is purpose-built for moving large media files at scale.<\/p>\n\n\n\n<p>NiFi provides:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Flow control and retry logic<\/strong> that handle network interruptions gracefully, which is essential when transferring terabytes of data over hours or days.<\/li>\n\n\n\n<li><strong>Data provenance tracking<\/strong> that records exactly which files moved where and when, making it possible to debug issues without guessing.<\/li>\n\n\n\n<li><strong>A visual workflow designer<\/strong> that lets you build and monitor data flows without writing custom code.<\/li>\n<\/ul>\n\n\n\n<p>For multimodal datasets where media volume dominates, NiFi ensures files arrive intact and trackable. Check the<a href=\"https:\/\/nifi.apache.org\/docs\/nifi-docs\/html\/user-guide.html\" target=\"_blank\" rel=\"noreferrer noopener\"> Apache NiFi User Guide<\/a> to get started with building your first data flow.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Airbyte for syncing structured and semi-structured metadata<\/h3>\n\n\n\n<p>Media files are only half the story. Annotations, labels, captions, transcripts, and database records provide the context that makes raw media useful for training.<a href=\"https:\/\/airbyte.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Airbyte<\/a> excels at moving this structured and semi-structured metadata.<\/p>\n\n\n\n<p>Airbyte handles:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Schema consistency<\/strong> when pulling metadata from multiple sources, ensuring annotation formats don&#8217;t drift between your labeling platform, your CRM, and your feature store.<\/li>\n\n\n\n<li><strong>Incremental syncs<\/strong> that only transfer changed records, avoiding unnecessary data movement as your datasets grow.<\/li>\n\n\n\n<li><strong>Multiple data systems<\/strong> via a broad catalog of connectors for databases, SaaS platforms, file formats, and cloud storage services.<\/li>\n<\/ul>\n\n\n\n<p>Unlike NiFi, which focuses on raw file movement, Airbyte understands data schemas and transformations. Use it to keep your metadata in sync across systems. The<a href=\"https:\/\/docs.airbyte.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Airbyte documentation<\/a> provides setup guides for most common data sources.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">lakeFS for versioning for reproducible training<\/h3>\n\n\n\n<p>After moving media via NiFi and metadata via Airbyte, you need a way to snapshot the entire dataset so you can reproduce training runs six months later.<a href=\"https:\/\/lakefs.io\/\" target=\"_blank\" rel=\"noreferrer noopener\"> lakeFS<\/a> brings Git-like version control to object storage.<\/p>\n\n\n\n<p>lakeFS enables:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Branching and snapshots<\/strong> of entire datasets without copying data. You can create a branch, run an experiment, and merge or discard the results.<\/li>\n\n\n\n<li><strong>Atomic commits<\/strong> that ensure media, metadata, and derived features stay aligned as your corpus evolves.<\/li>\n\n\n\n<li><strong>Zero-copy clones<\/strong> that let multiple teams work on isolated versions of production data without storage overhead.<\/li>\n<\/ul>\n\n\n\n<p>lakeFS acts as a version control layer on top of storage like Backblaze B2, tracking changes without duplicating objects. When a training run produces a new model, you can tag the exact dataset version that went into it. The<a href=\"https:\/\/docs.lakefs.io\/v1.72\/quickstart\/\" target=\"_blank\" rel=\"noreferrer noopener\"> lakeFS quickstart guide<\/a> walks through creating your first repository and branch.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">After migration, the hard part begins: Making the dataset usable<\/h2>\n\n\n\n<p>Moving data into object storage solves logistics, not usability. Even in B2, your media files, labels, and derived features remain scattered\u2014images in one prefix, annotations in another, embeddings in a third. Training code becomes a tangle of custom loaders that stitch everything together, break when datasets change, and consume more engineering time than model tuning.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Where Pixeltable fits<\/h3>\n\n\n\n<p>Pixeltable provides the missing layer between migrated storage and training-ready data. It&#8217;s a declarative data infrastructure specifically designed for multimodal AI applications.<\/p>\n\n\n\n<p>Here&#8217;s what Pixeltable does:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Unifies media and metadata<\/strong> into a single table interface: images, video frames, audio clips, and their associated labels, embeddings, and annotations live in one queryable structure.<\/li>\n\n\n\n<li><strong>Stores computed results automatically<\/strong>. Run OCR on documents, generate CLIP embeddings for images, or extract audio transcripts once, and Pixeltable caches the results for reuse.<\/li>\n\n\n\n<li><strong>References Backblaze B2 objects directly<\/strong> without copying data. Files stay in Backblaze B2, and Pixeltable maintains pointers and metadata in a local Postgres instance. Pixeltable automatically caches the files locally on access, and can write media files back to B2 (see our project for examples: https:\/\/github.com\/backblaze-b2-samples\/b2-pixeltable-multimodal-data).<\/li>\n\n\n\n<li><strong>Supports built-in transforms<\/strong> like embedding generation, image captioning, and OCR with lazy evaluation. Define transformations once, and they run incrementally as new data arrives.<\/li>\n<\/ul>\n\n\n\n<p>Instead of maintaining custom loaders and indexing scripts, you define a schema once. Pixeltable handles orchestration, caching, and queries. The result is a training dataset you can slice, filter, and feed directly into <a href=\"https:\/\/docs.pytorch.org\/tutorials\/beginner\/basics\/data_tutorial.html\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch DataLoaders<\/a> or<a href=\"https:\/\/huggingface.co\/docs\/datasets\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Hugging Face Datasets<\/a>.<\/p>\n\n\n\n<p>Check the<a href=\"https:\/\/docs.pixeltable.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Pixeltable documentation<\/a> to see how tables, computed columns, and queries work in practice.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A practical end-to-end workflow<\/h2>\n\n\n\n<p>Here&#8217;s how these tools fit together in a real-world pipeline:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Move media via NiFi \u2192 Backblaze B2<\/h3>\n\n\n\n<p>Set up an<a href=\"https:\/\/nifi.apache.org\/docs\/nifi-docs\/html\/getting-started.html\" target=\"_blank\" rel=\"noreferrer noopener\"> Apache NiFi flow<\/a> to transfer images, video files, or other large binaries from your current storage (on-premise NAS, another cloud provider, or local drives) to a Backblaze B2 bucket. Configure retry logic and provenance tracking so you can verify every file arrived.<\/p>\n\n\n\n<p>Use NiFi processors like <code>GetFile<\/code>, <code>PutS3Object<\/code>, and <code>RouteOnAttribute<\/code> to handle file movement and error routing. The <a href=\"https:\/\/www.backblaze.com\/apidocs\/introduction-to-the-s3-compatible-api\" target=\"_blank\" rel=\"noreferrer noopener\">Backblaze B2 Cloud Storage S3-compatible API<\/a> works seamlessly with NiFi&#8217;s S3 processors.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Sync metadata via Airbyte<\/h3>\n\n\n\n<p>Configure<a href=\"https:\/\/docs.airbyte.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Airbyte<\/a> to pull annotations, labels, captions, and database records from your labeling tool, feature store, or other sources. Set up connections to sync metadata incrementally as it changes. If annotations live in Postgres and captions come from a cloud-based labeling platform, Airbyte normalizes both into a consistent schema in Backblaze B2 or a dedicated metadata store.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Create a lakeFS branch to snapshot the dataset<\/h3>\n\n\n\n<p>Initialize a<a href=\"https:\/\/docs.lakefs.io\/latest\/understand\/model\/\" target=\"_blank\" rel=\"noreferrer noopener\"> lakeFS repository<\/a> pointing to your Backblaze B2 bucket. Create a branch to isolate this version of the dataset. If something goes wrong during training, you can roll back or compare versions. Use the <a href=\"https:\/\/docs.lakefs.io\/latest\/reference\/cli\/\" target=\"_blank\" rel=\"noreferrer noopener\">lakeFS CLI<\/a> or Python client to create branches and commits programmatically.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. Define a Pixeltable schema referencing B2 objects + synced metadata<\/h3>\n\n\n\n<p>In Pixeltable, create a table with columns for image paths (pointing to Backblaze B2), labels, captions, and any other metadata fields. Import your data so each row represents one training example: one image, its label, its caption, and any associated metadata.Pixeltable doesn&#8217;t copy image files\u2014it stores references and metadata, automatically caching the files locally on access. The images stay in Backblaze. The <a href=\"https:\/\/docs.pixeltable.com\/platform\/tables\" target=\"_blank\" rel=\"noreferrer noopener\">Pixeltable Tables guide<\/a> explains how to create tables with multimodal column types and import data from external sources.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5. Run transforms (embeddings, captions, OCR) inside Pixeltable<\/h3>\n\n\n\n<p>Define computed columns for embeddings, captions, or OCR results. <a href=\"https:\/\/docs.pixeltable.com\/tutorials\/computed-columns\" target=\"_blank\" rel=\"noreferrer noopener\">Pixeltable&#8217;s computed columns<\/a> run transformations lazily as data is queried or when you explicitly trigger computation.<\/p>\n\n\n\n<p>For example, you can add CLIP embeddings using Pixeltable&#8217;s built-in Hugging Face integration, or generate AI captions using OpenAI&#8217;s vision API. Once defined, these columns compute incrementally\u2014new images trigger automatic processing without reprocessing the entire dataset.<\/p>\n\n\n\n<p>The<a href=\"https:\/\/pixeltable.github.io\/pixeltable\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Pixeltable API reference<\/a> documents all available functions for common operations like embedding generation, image processing, and text analysis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6. Query or filter the unified dataset<\/h3>\n\n\n\n<p>Use Pixeltable&#8217;s query interface to filter, sort, and slice your data. For example, find all images labeled &#8220;cat&#8221; with embeddings similar to a reference image. Or extract rows where captions mention &#8220;outdoor&#8221; and timestamps fall within a specific range.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7. Feed batches directly into PyTorch\/Hugging Face<\/h3>\n\n\n\n<p>Export data from Pixeltable into <a href=\"https:\/\/docs.pytorch.org\/tutorials\/beginner\/basics\/data_tutorial.html\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch DataLoaders<\/a> or<a href=\"https:\/\/huggingface.co\/docs\/datasets\/\" target=\"_blank\" rel=\"noreferrer noopener\"> Hugging Face Datasets<\/a> format for training. Pixeltable handles batching, shuffling, and data access so your training loop stays clean.<\/p>\n\n\n\n<p>The Pixeltable documentation covers various export formats and integrations with popular ML frameworks, allowing you to avoid intermediate export steps and maintain a streamlined workflow from data preparation to model training.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">From fragmented storage to production-ready training data<\/h2>\n\n\n\n<p>Multimodal AI datasets don&#8217;t have to be a maintenance nightmare. By chaining together proven open-source tools\u2014NiFi and Airbyte for migration, lakeFS for versioning, and Pixeltable for unified access\u2014you can turn scattered files and metadata into queryable training assets.<\/p>\n\n\n\n<p>Once data lands in Backblaze B2, this stack eliminates the custom glue code, brittle loaders, and alignment issues that typically slow down training workflows. Your team gets reproducible datasets, clean interfaces, and more time for model development instead of infrastructure firefighting.<\/p>\n\n\n\n<p>Ready to get started? Check out the<a href=\"https:\/\/www.backblaze.com\/docs\" target=\"_blank\" rel=\"noreferrer noopener\"> Backblaze B2 documentation<\/a> to set up your object storage, and explore<a href=\"https:\/\/github.com\/pixeltable\/pixeltable\" target=\"_blank\" rel=\"noreferrer noopener\"> Pixeltable&#8217;s examples<\/a> to see multimodal workflows in action.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to eliminate custom glue code and data alignment issues in multimodal AI pipelines by migrating media and metadata with NiFi and Airbyte, versioning with lakeFS, and using Pixeltable on Backblaze B2 to create queryable, production-ready training assets.<\/p>\n","protected":false},"author":224,"featured_media":112740,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[7,434,482,484],"tags":[489,468],"class_list":["post-112739","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-storage","category-featured-1","category-hybrid-cloud","category-partner-news","tag-ai-ml","tag-b2cloud","entry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A Developer&#039;s Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Developer&#039;s Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable\" \/>\n<meta property=\"og:description\" content=\"Learn how to eliminate custom glue code and data alignment issues in multimodal AI pipelines by migrating media and metadata with NiFi and Airbyte, versioning with lakeFS, and using Pixeltable on Backblaze B2 to create queryable, production-ready training assets.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/\" \/>\n<meta property=\"og:site_name\" content=\"Backblaze Blog | Cloud Storage &amp; Cloud Backup\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/backblaze\" \/>\n<meta property=\"article:published_time\" content=\"2026-01-21T17:53:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-01-21T17:53:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"820\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Maddie Presland\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@backblaze\" \/>\n<meta name=\"twitter:site\" content=\"@backblaze\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Maddie Presland\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Developer's Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/","og_locale":"en_US","og_type":"article","og_title":"A Developer's Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable","og_description":"Learn how to eliminate custom glue code and data alignment issues in multimodal AI pipelines by migrating media and metadata with NiFi and Airbyte, versioning with lakeFS, and using Pixeltable on Backblaze B2 to create queryable, production-ready training assets.","og_url":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/","og_site_name":"Backblaze Blog | Cloud Storage &amp; Cloud Backup","article_publisher":"https:\/\/www.facebook.com\/backblaze","article_published_time":"2026-01-21T17:53:01+00:00","article_modified_time":"2026-01-21T17:53:02+00:00","og_image":[{"width":1440,"height":820,"url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png","type":"image\/png"}],"author":"Maddie Presland","twitter_card":"summary_large_image","twitter_creator":"@backblaze","twitter_site":"@backblaze","twitter_misc":{"Written by":"Maddie Presland","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#article","isPartOf":{"@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/"},"author":{"name":"Maddie Presland","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/5a95887c8e781ea9cf10472e47175ce0"},"headline":"A Developer&#8217;s Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable","datePublished":"2026-01-21T17:53:01+00:00","dateModified":"2026-01-21T17:53:02+00:00","mainEntityOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/"},"wordCount":1506,"commentCount":0,"publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png","keywords":["AI\/ML","B2Cloud"],"articleSection":["Cloud Storage","Featured","Hybrid Cloud","Partner News"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/","url":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/","name":"A Developer's Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable","isPartOf":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#primaryimage"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png","datePublished":"2026-01-21T17:53:01+00:00","dateModified":"2026-01-21T17:53:02+00:00","breadcrumb":{"@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#primaryimage","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png","width":1440,"height":820,"caption":"A decorative image showing gears and a cloud."},{"@type":"BreadcrumbList","@id":"https:\/\/www.backblaze.com\/blog\/a-developers-guide-to-migrating-multimodal-ai-training-data-and-putting-it-to-work-with-pixeltable\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/"},{"@type":"ListItem","position":2,"name":"A Developer&#8217;s Guide to Migrating Multimodal AI Training Data (and Putting It to Work) with Pixeltable"}]},{"@type":"WebSite","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","name":"Backblaze Cloud Solutions Blog","description":"Cloud Storage &amp; Cloud Backup","publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization","name":"Backblaze","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","width":512,"height":512,"caption":"Backblaze"},"image":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/backblaze","https:\/\/x.com\/backblaze","https:\/\/www.youtube.com\/user\/Backblaze","https:\/\/en.wikipedia.org\/wiki\/Backblaze"]},{"@type":"Person","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/5a95887c8e781ea9cf10472e47175ce0","name":"Maddie Presland","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/10\/Backblaze_Author-Maddie-Presland_Square-150x150.jpg","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/10\/Backblaze_Author-Maddie-Presland_Square-150x150.jpg","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/10\/Backblaze_Author-Maddie-Presland_Square-150x150.jpg","caption":"Maddie Presland"},"description":"Maddie Presland is a Product Marketing Manager at Backblaze specializing in app storage use cases for multi-cloud architectures and AI. Maddie has more than five years of experience as a product marketer focusing on cloud infrastructure and developing technical marketing content for developers. With a background in journalism, she combines storytelling with her technical curiosity and ability to crash course just about anything. Connect with her on LinkedIn.","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/author\/maddiepresland\/"}]}},"jetpack_featured_media_url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2026\/01\/Q126-0010-Blog-Header-1440x820-1.png","_links":{"self":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/112739","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/users\/224"}],"replies":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/comments?post=112739"}],"version-history":[{"count":0,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/112739\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media\/112740"}],"wp:attachment":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media?parent=112739"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/categories?post=112739"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/tags?post=112739"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}