{"id":112071,"date":"2025-05-01T10:29:48","date_gmt":"2025-05-01T17:29:48","guid":{"rendered":"https:\/\/www.backblaze.com\/blog\/?p=112071"},"modified":"2025-12-12T13:02:29","modified_gmt":"2025-12-12T21:02:29","slug":"iceberg-on-backblaze-b2","status":"publish","type":"post","link":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/","title":{"rendered":"Iceberg on Backblaze B2"},"content":{"rendered":"\r\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"534\" class=\"wp-image-112072\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png\" alt=\"A decorative image showing icons of different file types on a grid superimposed over a cloud.\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png 936w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code-300x171.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code-768x438.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/figure>\r\n\r\n\r\n\r\n<div class=\"wp-block-spacer\" style=\"height: 10px;\" aria-hidden=\"true\">\u00a0<\/div>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">If you work with cloud storage and data lakes, you\u2019re likely hearing the word \u201cIceberg\u201d with increasing frequency, occasionally prefixed by \u201cApache\u201d. What is Apache Iceberg, and how can you leverage it to efficiently store data in object stores such as Backblaze B2 Cloud Storage? I\u2019ll answer both of those questions in this blog post.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">But, first, join me on a brief trip back in time to the beginning of the twenty-first century, a long-ago time before the emergence of big data and cloud computing.<\/p>\r\n\r\n\r\n\r\n<div class=\"abstract\" style=\"line-height: 1.8; margin: 24px 12px; padding: 24px 12px 10px 12px;\">\r\n<h4>A timely shoutout to the Data Council conference<\/h4>\r\nWe recently attended the <a href=\"https:\/\/www.datacouncil.ai\/bay-2025\" target=\"_blank\" rel=\"noopener noreferrer\">2025 Data Council<\/a> conference and caught Ryan Blue, co-creator of Apache Iceberg\u2019s excellent presentation (featuring some very entertaining slides). If you want to hear more about topics like this one, feel free to join us at <a href=\"https:\/\/www.brighttalk.com\/series\/7325?utm_source=BackblazeNA&amp;utm_medium=BrightTALK&amp;utm_campaign=7325\" target=\"_blank\" rel=\"noopener noreferrer\">Backblaze Weekly<\/a>, an ongoing webinar series where we discuss all things Backblaze.<\/div>\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"702\" class=\"wp-image-112073\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_1_Ryan-Blue-at-Data-Council.png\" alt=\"An image of Ryan Blue speaking at the 2025 Data Council conference. \" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_1_Ryan-Blue-at-Data-Council.png 936w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_1_Ryan-Blue-at-Data-Council-300x225.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_1_Ryan-Blue-at-Data-Council-768x576.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/>\r\n<figcaption class=\"wp-element-caption\">Ryan Blue speaking at the 2025 Data Council conference. Note: His shirt says \u201cthe future is open\u201d. We agree!<\/figcaption>\r\n<\/figure>\r\n<\/div>\r\n\r\n\r\n<div class=\"wp-block-spacer\" style=\"height: 15px;\" aria-hidden=\"true\">\u00a0<\/div>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">CSV: The lingua franca of tabular data<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">In the early 2000s, if you were working with tabular data, you were likely using either a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Relational_database\" target=\"_blank\" rel=\"noreferrer noopener\">relational database management system (RDBMS)<\/a>, such as Oracle Database, or a spreadsheet, likely Microsoft Excel.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Data stored in an RDBMS is highly structured, meaning that it MUST conform to a predefined schema. For example, you might create an employee table with columns such as first name, last name, date of birth, hire date, and so on. The database schema holds metadata such as the name and data type of each column, whether that column must have a value, relationships between tables, and so on.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A spreadsheet, on the other hand, has <em>some<\/em> structure\u2014data is arranged in rows and columns, similarly to an RDBMS\u2013but each cell can contain anything: text, a number, a formula referencing other cells, even an image in today\u2019s spreadsheets. We say that a spreadsheet is <em>semi-structured data<\/em>.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">At the turn of the century, each database and spreadsheet had its own proprietary file format, optimized for its own requirements, and often not at all publicly documented, but the need to be able to exchange data between applications led to broad adoption of a file format to allow just that: comma-separated values, or CSV.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Here\u2019s a simple example of some tabular data represented as CSV:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">employee_id,first_name,last_name,reports_to,job_title,is_manager<br \/>1,Gleb,Budman,,CEO,1<br \/>123,Patrick,Thomas,1,\"VP of Marketing\",1<br \/>45,Yev,Pusin,123,\"Head of Communications and Community\",1<br \/>678,Pat,Patterson,45,\"Chief Technical Evangelist\",0<\/pre>\r\n\r\n\r\n\r\n<div class=\"wp-block-spacer\" style=\"height: 8px;\" aria-hidden=\"true\">\u00a0<\/div>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">CSV is simple and flexible enough that it was easy for me to type that example up manually and import it into Microsoft Excel with no problems at all. Note that, as well as the commas, the double quotes in the CSV data are part of the file format, and do not appear in the imported data:<\/p>\r\n\r\n\r\n\r\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"362\" class=\"wp-image-112074\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_2_Excel-spreadsheet.png\" alt=\"A screenshot of an Excel spreadsheet. \" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_2_Excel-spreadsheet.png 936w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_2_Excel-spreadsheet-300x116.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_2_Excel-spreadsheet-768x297.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/figure>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">CSV has a lot of advantages: It\u2019s simple; flexible; widely understood; the optional header line means that data can be somewhat self-describing; and it\u2019s not controlled by any single vendor.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">CSV does, however, also have a few disadvantages, including:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>There\u2019s no schema; nothing in that file expresses that the values in the first column, apart from the header, must be integers.<\/li>\r\n\r\n\r\n\r\n<li>It\u2019s difficult to represent complex or hierarchical datasets.<\/li>\r\n\r\n\r\n\r\n<li>Data is stored as text, which is inefficient for numerical and repetitive data. Text representations of numbers occupy more storage than binary, and applications must convert them to binary when loading the file and convert them back to text when saving it.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Avro, Parquet and ORC: File formats for big data<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The emergence of open-source distributed computing frameworks such as Apache Hadoop and, later, Apache Spark, in the first two decades of this century drove the creation and adoption of more efficient ways of storing tabular data. <a href=\"https:\/\/avro.apache.org\/\">Avro<\/a>, <a href=\"https:\/\/parquet.apache.org\/\">Parquet<\/a> and <a href=\"https:\/\/orc.apache.org\/\">ORC<\/a>, all Apache projects, are binary file formats that address shortcomings of CSV, such as encapsulating schema alongside the data.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Avro, like CSV, is designed for <em>row-oriented<\/em> data, which makes it well-suited to use cases that involve appending new data to files. Parquet and ORC, in contrast, are <em>column-oriented<\/em> file formats, perfect for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Online_analytical_processing\">online analytical processing (OLAP)<\/a> use cases where, for example, an application might read an entire column from a table to calculate the sum of its values. As well as storing numbers in a binary representation, Parquet and ORC can also reduce file size through compression strategies such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Run-length_encoding\">run-length encoding<\/a>.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Here\u2019s a concrete example: The <a href=\"https:\/\/www.backblaze.com\/cloud-storage\/resources\/hard-drive-test-data\">Drive Stats<\/a> data set for December 2024 occupies 3.7GB of storage in CSV format. As Parquet, the same data consumes just 242MB, a data compression ratio of more than 15:1.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Why does it matter if your dataset is smaller? Well, beyond just <a href=\"https:\/\/www.backblaze.com\/blog\/calculate-cost-cloud-storage\/\" target=\"_blank\" rel=\"noreferrer noopener\">cost savings<\/a>, which are amplified when dealing with huge datasets, smaller files mean that running queries against full datasets <a href=\"https:\/\/medium.com\/art-of-data-engineering\/handling-large-datasets-in-sql-2da0f435fb3c\" target=\"_blank\" rel=\"noreferrer noopener\">takes less time<\/a>, which reduces server load, compute costs, and so on. \u00a0<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">From file formats to table formats and data lakes<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Apache_Hadoop\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Hadoop<\/a>\u2019s original use case was as an implementation of <a href=\"https:\/\/en.wikipedia.org\/wiki\/MapReduce\" target=\"_blank\" rel=\"noreferrer noopener\">MapReduce<\/a>, a programming model for manipulating large datasets. Engineers at Facebook, tasked with allowing SQL queries over datasets generated by Hadoop, created <a href=\"https:\/\/en.wikipedia.org\/wiki\/Apache_Hive\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Hive<\/a>, and, with it, the Hive <em>table format<\/em>, which specified how to view a collection of files as a single logical table. The Hive table format in turn allowed organizations to create <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_lake\" target=\"_blank\" rel=\"noreferrer noopener\">data lakes<\/a>, repositories that store structured and semi-structured data in their original format for analysis by a wide range of tools, and, later, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_lake#Data_lakehouses\" target=\"_blank\" rel=\"noreferrer noopener\">data lakehouses<\/a>, which aim to combine the benefits of data lakes and traditional data warehouses by storing structured data using data lake tools and technologies.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">A key concept of the Hive table format is <em>partitioning<\/em>, a way of organizing files to reduce the amount of data that must be read to process a query. Taking the Drive Stats dataset as an example, we can partition the files by year and month, so that each file has a prefix of the form:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">\/drivestats\/year={year}\/month={month}\/<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For example:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">\/drivestats\/year=2024\/month=12\/<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">With this partitioning scheme, a system processing a query for hard drive statistics for, say, December 12, 2024, need only retrieve files with the above prefix. You might be wondering, \u201cWhy not partition the data on day, also, to further reduce the number of files that must be retrieved?\u201d The answer depends on the data volume and access patterns. It\u2019s much more efficient to partition data into fewer large files than many small files, so overly granular partitioning can actually impair performance.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">It\u2019s worth mentioning that file formats and table formats are largely independent of each other. You can use Avro, Parquet, ORC, or even CSV files with the Hive table format.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For more detail on the Parquet file format, Hive table format, and partitioning, see the blog post, <a href=\"https:\/\/www.backblaze.com\/blog\/storing-and-querying-analytical-data-in-backblaze-b2\/\" target=\"_blank\" rel=\"noreferrer noopener\">Storing and Querying Analytical Data in Backblaze B2<\/a>.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">\u201cIceberg, captain, dead ahead!\u201d<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">While the Hive table format served the big data community well for several years, it had a number of shortcomings:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li>Every query incurs a file list (\u201clist objects\u201d, in S3 API terms) operation, which is particularly expensive with cloud object storage, both in terms of time and API transaction charges.<\/li>\r\n\r\n\r\n\r\n<li>Deleting or modifying data typically implies rewriting an entire data file, even if only a single row was affected.<\/li>\r\n\r\n\r\n\r\n<li>Hive can only partition datasets on columns that are in the table schema. For example, the Drive Stats data set includes a <code>date<\/code> column, so to use it with Hive, we had to create additional, redundant, <code>year<\/code> and <code>month<\/code> columns.<\/li>\r\n\r\n\r\n\r\n<li>Any changes to the data schema or partitioning strategy require affected files to be rewritten, making schema evolution problematic, if not infeasible, for large datasets.<\/li>\r\n\r\n\r\n\r\n<li>There is limited support for the kind of <a href=\"https:\/\/en.wikipedia.org\/wiki\/ACID\" target=\"_blank\" rel=\"noreferrer noopener\">ACID (Atomic, Consistent, Isolated, Durable) transactions<\/a> that are familiar from the RDBMS world. Attempts to add transaction support to Hive were not widely or consistently supported.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">As a result, vendors and the broader big data community formed a number of projects to define new table formats to succeed Hive, including <a href=\"https:\/\/iceberg.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Iceberg<\/a>, <a href=\"https:\/\/hudi.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Hudi<\/a>, and <a href=\"https:\/\/delta.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Delta Lake<\/a>, a Linux Foundation project.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The three are broadly comparable in terms of features, but, over the past couple of years, Iceberg has emerged as the leader in terms of vendor adoption, with <a href=\"https:\/\/www.snowflake.com\/en\/blog\/storage-iceberg-tables-now-generally-available\/\" target=\"_blank\" rel=\"noreferrer noopener\">Snowflake announcing general availability of Iceberg tables in June 2024<\/a>, and <a href=\"https:\/\/aws.amazon.com\/blogs\/aws\/new-amazon-s3-tables-storage-optimized-for-analytics-workloads\/\" target=\"_blank\" rel=\"noreferrer noopener\">Amazon announcing S3 Tables, its managed Iceberg offering, in December 2024<\/a>. Significantly, Databricks, the prime mover behind Delta Lake, <a href=\"https:\/\/www.databricks.com\/blog\/databricks-tabular\" target=\"_blank\" rel=\"noreferrer noopener\">acquired Tabular, a company founded by the original creators of Apache Iceberg, in June 2024<\/a>, establishing its own beachhead in the Iceberg community.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Iceberg\u2018s features allow it to be used to organize huge data sets, efficiently and flexibly:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><strong>Table metadata<\/strong> including the list of files that comprise a table is stored as JSON data alongside the data files, eliminating the need to run an expensive list object operation for every query.<\/li>\r\n\r\n\r\n\r\n<li><strong>Schema evolution<\/strong> allows you to add, drop, update, or rename columns.<\/li>\r\n\r\n\r\n\r\n<li><strong>Hidden partitioning<\/strong> decouples partitioning from the table schema. For example, you can partition data like the Drive Stats dataset by year and month based on the existing date values, without creating additional columns.<\/li>\r\n\r\n\r\n\r\n<li><strong>Partition layout evolution<\/strong> allows you to modify your partitioning strategy as data volume or access patterns change.<\/li>\r\n\r\n\r\n\r\n<li><strong>Time travel<\/strong> allows you to query table snapshots.<\/li>\r\n\r\n\r\n\r\n<li><strong>Serializable isolation<\/strong> provides atomic table changes, ensuring readers never see inconsistent data.<\/li>\r\n\r\n\r\n\r\n<li><strong>Multiple concurrent writers<\/strong> use optimistic concurrency, retrying to ensure that compatible updates succeed while detecting conflicting writes.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Iceberg is widely supported across the big data ecosystem, with many applications and tools\u00a0 allowing you to store Iceberg tables in S3 compatible cloud object storage such as Backblaze B2. In this article, I\u2019ll look at the simplest use case, running queries against the Drive Stats dataset, with three representative examples: Snowflake, Trino, and DuckDB.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Writing Iceberg data to Backblaze B2<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">I wrote a simple Python application, drivestats2iceberg, using the <a href=\"https:\/\/py.iceberg.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">PyIceberg<\/a> library, that converts the Drive Stats dataset from the zipped CSV files we publish to Parquet files in an Iceberg table stored in a Backblaze B2 Bucket. There are some useful techniques in drivestats2iceberg, and it is <a href=\"https:\/\/github.com\/backblaze-b2-samples\/drivestats2iceberg\" target=\"_blank\" rel=\"noreferrer noopener\">published on GitHub as open source<\/a>, under the MIT license, so feel free to use it as a starting point for your own data conversion apps.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Querying Iceberg tables in Backblaze B2 from Snowflake<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.snowflake.com\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">Snowflake<\/a> is a data-as-a-service platform addressing a wide variety of use cases, including artificial intelligence (AI), machine learning (ML), collaboration across organizations, and data lakes.<\/p>\r\n\r\n\r\n<div class=\"wp-block-image\">\r\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"534\" class=\"wp-image-112075\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_3_Backblaze-and-Snowflake.png\" alt=\"A decorative image showing the Backblaze and Snowflake logos superimposed over a cloud that dissolves into binary 0s and 1s. \" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_3_Backblaze-and-Snowflake.png 936w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_3_Backblaze-and-Snowflake-300x171.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/Iceberg-on-B2_3_Backblaze-and-Snowflake-768x438.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/>\r\n<figcaption class=\"wp-element-caption\">We\u2019re big fans of the <a href=\"https:\/\/www.backblaze.com\/blog\/data-driven-decisions-wwith-snowflake-and-backblaze-b2\/\" target=\"_blank\" rel=\"noreferrer noopener\">Backblaze + Snowflake integration.<\/a> Our <a href=\"https:\/\/www.backblaze.com\/cloud-storage\/case-studies\/amplify\" target=\"_blank\" rel=\"noreferrer noopener\">customers<\/a> are too.<\/figcaption>\r\n<\/figure>\r\n<\/div>\r\n\r\n\r\n<p class=\"wp-block-paragraph\">As I mentioned above, Snowflake announced general availability of its Iceberg tables offering in June 2024, allowing you to manipulate Iceberg tables located on external volumes, outside your Snowflake warehouse, and query them alongside data in Snowflake-managed tables.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Snowflake\u2019s Iceberg implementation is quite complicated, with different capabilities according to your choice of cloud object storage provider and whether you want Snowflake to manage your Iceberg catalog or use a catalog integration.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">For our simple use case, where the Iceberg metadata and data files already exist in a Backblaze B2 Bucket, the first step is to create a Snowflake external volume, configuring it with suitable credentials and the location of the Drive Stats data.<\/p>\r\n\r\n\r\n\r\n<p class=\"has-background wp-block-paragraph\" style=\"background-color: #e6e3ff;\">Note: the application key shown in this Snowflake statement has read-only access to the <code>drivestats-iceberg<\/code> bucket. You can use it to query the Drive Stats data set from your own Snowflake instance or from other environments.<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CREATE EXTERNAL VOLUME drivestats_b2<br \/>  STORAGE_LOCATIONS = (<br \/>    (<br \/>      NAME = 'b2_storage_location'<br \/>      STORAGE_PROVIDER = 'S3COMPAT'<br \/>      STORAGE_BASE_URL = 's3compat:\/\/drivestats-iceberg\/'<br \/>      CREDENTIALS = (<br \/>        AWS_KEY_ID = '0045f0571db506a0000000017'<br \/>        AWS_SECRET_KEY = 'K004Fs\/bgmTk5dgo6GAVm2Waj3Ka+TE'<br \/>      )<br \/>      STORAGE_ENDPOINT = 's3.us-west-004.backblazeb2.com'<br \/>    )<br \/>  )<br \/>  ALLOW_WRITES = FALSE;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Next, you must create a catalog integration. The object store catalog integration simply reads Iceberg metadata from an external (to Snowflake) cloud storage location:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CREATE CATALOG INTEGRATION my_iceberg_catalog_integration<br \/>  CATALOG_SOURCE = OBJECT_STORE<br \/>  TABLE_FORMAT = ICEBERG<br \/>  ENABLED = TRUE;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Now you can create an Iceberg table object that references the existing dataset. Note that Snowflake requires you to explicitly specify the metadata file to use for column definitions; this is typically the most recently created JSON file under the <code>metadata<\/code> prefix.<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CREATE ICEBERG TABLE drivestats<br \/>  EXTERNAL_VOLUME = 'drivestats_b2'<br \/>  CATALOG = 'my_iceberg_catalog_integration'<br \/>  METADATA_FILE_PATH = 'drivestats\/metadata\/00225-317608b1-35a6-4135-8393-7543583623db.metadata.json';<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">That done, you can start querying the data:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>How many records are in the current Drive Stats dataset?<\/strong><\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">SELECT COUNT(*) <br \/>FROM drivestats;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Result:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">564566016<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>How many hard drives was Backblaze spinning on a given date?<\/strong><\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">SELECT COUNT(*) <br \/>FROM drivestats <br \/>WHERE date = DATE '2024-12-31';<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Result:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">305180<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>How many exabytes of raw storage was Backblaze managing on a given date?<\/strong><\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">SELECT ROUND(SUM(CAST(capacity_bytes AS BIGINT))\/1e+18, 2) <br \/>FROM drivestats <br \/>WHERE date = DATE '2024-12-31';<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Result:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">4.42<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><strong>What are the top 10 most common drive models in the dataset?<\/strong><\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">SELECT model, COUNT(DISTINCT serial_number) AS count <br \/>FROM drivestats <br \/>GROUP BY model<br \/>ORDER BY count DESC<br \/>LIMIT 10;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Results (in drive days):<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">TOSHIBA MG08ACA16TA   40859<br \/>TOSHIBA MG07ACA14TA   39387<br \/>ST12000NM0007         38843<br \/>ST4000DM000           37040<br \/>ST16000NM001G         34501<br \/>WDC WUH722222ALE6L4   30148<br \/>WDC WUH721816ALE6L4   26547<br \/>ST12000NM0008         21028<br \/>HGST HMS5C4040BLE640  16349<br \/>ST8000NM0055          15680<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">My x-small Snowflake warehouse executed the first three queries in a fraction of a second. As you might expect from its additional complexity, the last query took longer: 16 seconds.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Querying Iceberg tables in Backblaze B2 from Trino<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/trino.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Trino<\/a> is an open-source distributed query engine, <a href=\"https:\/\/trino.io\/blog\/2020\/12\/27\/announcing-trino.html\" target=\"_blank\" rel=\"noreferrer noopener\">formerly known as PrestoSQL<\/a>. Trino can natively query data in Backblaze B2, Cassandra, MySQL, and many other data sources without copying that data into its own dedicated store. Trino has become the Backblaze Evangelism Team\u2019s go-to date lake tool over the past few years; <a href=\"https:\/\/www.google.com\/search?q=trino+site%3Abackblaze.com%2Fblog\" target=\"_blank\" rel=\"noreferrer noopener\">we\u2019ve used it in several past blog posts<\/a>, and we maintain a <a href=\"https:\/\/github.com\/backblaze-b2-samples\/trino-getting-started-b2\/\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub repository with quick start guides for running Trino with BackblazeB2<\/a>.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">To access the Drive Stats data set from Trino, you must configure <a href=\"https:\/\/trino.io\/docs\/current\/connector\/iceberg.html\" target=\"_blank\" rel=\"noreferrer noopener\">its Iceberg connector<\/a> with a catalog properties file. For example, to configure a catalog named <code>drivestats_b2<\/code>, create a file <code>etc\/catalog\/drivestats_b2.properties<\/code>:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">connector.name=iceberg<br \/><br \/>hive.metastore.uri=thrift:\/\/hive-metastore:9083<br \/><br \/>iceberg.register-table-procedure.enabled=true<br \/><br \/>fs.native-s3.enabled=true<br \/><br \/>s3.endpoint=https:\/\/s3.us-west-004.backblazeb2.com<br \/>s3.region=us-west-004<br \/>s3.aws-access-key=0045f0571db506a0000000017<br \/>s3.aws-secret-key=K004Fs\/bgmTk5dgo6GAVm2Waj3Ka+TE<br \/>s3.exclusive-create=false<\/pre>\r\n\r\n\r\n\r\n<p class=\"has-background wp-block-paragraph\" style=\"background-color: #e6e3ff;\">Note that the above configuration file uses the same read-only credentials as the Snowflake example. You can use this configuration file as-is to explore the Drive Stats dataset using Trino.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Start the Trino server and CLI, then create a Trino schema with the location of the data, and set it as the default schema for subsequent queries:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CREATE SCHEMA drivestats_b2.ds_schema<br \/>    WITH (location = 's3:\/\/drivestats-iceberg\/');<br \/>USE drivestats_b2.ds_schema;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">The Trino Iceberg connector provides the <code>register_table<\/code> procedure for registering existing Iceberg tables into the metastore. Optionally, you can provide an additional <code>metadata_file_name<\/code> parameter if you wish to register the table with some specific table state, or if the connector cannot automatically figure out the metadata version to use.<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CALL drivestats_b2.system.register_table(<br \/>    schema_name =&gt; 'ds_schema',<br \/>    table_name =&gt; 'drivestats',<br \/>    table_location =&gt; 's3:\/\/drivestats-iceberg\/drivestats'<br \/>);<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Since you can query the table using the exact same SQL queries as in the Snowflake example, producing the exact same results, I won\u2019t reproduce them here. Running Trino in a Docker container on my MacBook Pro, the first three queries executed in less than three seconds, the fourth took just over a minute.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Querying Iceberg tables in Backblaze B2 from DuckDB<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/duckdb.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">DuckDB<\/a> is an open-source column-oriented RDBMS, intended for in-process use: embedded in applications. There are <a href=\"https:\/\/duckdb.org\/docs\/stable\/clients\/overview\" target=\"_blank\" rel=\"noreferrer noopener\">DuckDB client APIs<\/a> (also known as drivers) for many programming languages, including Python, Java, JavaScript (Node.js) and Go.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">DuckDB is focused on the same kinds of use cases as Snowflake and Trino; it is effectively the OLAP equivalent to SQLite, which targets online transaction processing (OLTP) workloads.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">To work with Iceberg tables in cloud object storage, you must install and load the <code>httpfs<\/code> and <code>iceberg<\/code> DuckDB extensions:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">INSTALL httpfs;\r\nLOAD httpfs;\r\n\r\nINSTALL iceberg;\r\nLOAD iceberg;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Now, you need to create a <a href=\"https:\/\/duckdb.org\/docs\/stable\/configuration\/secrets_manager.html\" target=\"_blank\" rel=\"noreferrer noopener\">secret<\/a> with your Backblaze B2 credentials.<\/p>\r\n\r\n\r\n\r\n<p class=\"has-background wp-block-paragraph\" style=\"background-color: #e6e3ff;\">Again, the application key shown here has read-only access to the Drive Stats dataset; you can use it to explore the data yourself if you like.<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CREATE SECRET secret (<br \/>    TYPE s3,<br \/>    KEY_ID '0045f0571db506a0000000017',<br \/>    SECRET 'K004Fs\/bgmTk5dgo6GAVm2Waj3Ka+TE',<br \/>    REGION 'us-west-004',<br \/>    ENDPOINT 's3.us-west-004.backblazeb2.com'<br \/>);<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">By default, queries against Iceberg tables in DuckDB use a <code>SELECT ... FROM iceberg_scan(...)<\/code> syntax, but you can define a schema and a view so that you can use the same SQL queries as with Snowflake and Trino:<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">First, a schema:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CREATE SCHEMA ds_schema;<br \/>USE ds_schema;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Then, a view:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">CREATE VIEW drivestats AS <br \/>    SELECT *<br \/>    FROM iceberg_scan(<br \/>        's3:\/\/drivestats-iceberg\/drivestats', <br \/>        version = '?',<br \/>        allow_moved_paths = true<br \/>    );<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Note: the version <code>= '?' <\/code>parameter tells DuckDB to examine the table\u2019s metadata files and \u201cguess\u201d which one corresponds to the latest version. This behavior is not enabled by default, so you must set <code>unsafe_enable_version_guessing<\/code> to <code>true<\/code> before you query the data, like this:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-preformatted\">SET unsafe_enable_version_guessing = true;<\/pre>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">That done, you can query the table using the exact same SQL queries as with Snowflake and Trino, with the exact same results. With DuckDB on my MacBook Pro, the first three queries took about 15\u201325 seconds; the fourth about 90 seconds.<\/p>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Note that Snowflake, Trino and DuckDB are very different systems, with different trade-offs between cost, performance, and flexibility. I\u2019ve included the execution times I saw to set your expectations when working with these tools, rather than as a point of comparison between them.<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">What\u2019s next for Apache Iceberg?<\/h2>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">Apache Iceberg is much more than a table format specification; it\u2019s a broad, thriving ecosystem that is constantly innovating new features, tracking progress via <a href=\"https:\/\/github.com\/apache\/iceberg\" target=\"_blank\" rel=\"noreferrer noopener\">its own GitHub repository<\/a>. Here are a few technologies that are currently in active development:<\/p>\r\n\r\n\r\n\r\n<ul class=\"wp-block-list\">\r\n<li><a href=\"https:\/\/github.com\/apache\/iceberg\/issues\/10392\" target=\"_blank\" rel=\"noreferrer noopener\">Variant Data Type Support<\/a> will offer a more efficient, versatile approach to managing hierarchical, JSON-like data, aligning with Apache Spark\u2019s variant format.<\/li>\r\n\r\n\r\n\r\n<li><a href=\"https:\/\/github.com\/apache\/iceberg\/issues\/10043\" target=\"_blank\" rel=\"noreferrer noopener\">Materialized Views<\/a> will allow you to define a view as you usually would, in terms of a query against one or more existing views or tables, that is able to store data, like a table. On creation, the materialized view is populated with data and functions as a cache, serving its data in response to queries. The materialized view can be periodically refreshed to keep it in sync with its sources.<\/li>\r\n\r\n\r\n\r\n<li><a href=\"https:\/\/github.com\/apache\/iceberg\/issues\/10260\" target=\"_blank\" rel=\"noreferrer noopener\">Geospatial Support<\/a> will add Iceberg-native data types and operations storage and analysis of geospatial data, allowing you to define columns as points, lines and polygons, and use conditions such as \u201cintersects\u201d in queries.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"wp-block-paragraph\">I\u2019ve only scratched the surface of Apache Iceberg in this blog post. Stay tuned for deeper dives into using Snowflake, Trino, DuckDB and more platforms and tools with the Iceberg table format and Backblaze B2 Cloud Storage.<\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>Chief Technical Evangelist Pat Patterson discusses how you can leverage Apache Iceberg to efficiently store data in object stores like Backblaze B2 Cloud Storage. <\/p>\n","protected":false},"author":174,"featured_media":112072,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":"","jetpack_post_was_ever_published":false},"categories":[7,434,483],"tags":[468],"class_list":["post-112071","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-storage","category-featured-1","category-tech-lab","tag-b2cloud","entry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to Use DuckDB with Iceberg on Backblaze B2 Cloud Storage<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Use DuckDB with Iceberg on Backblaze B2 Cloud Storage\" \/>\n<meta property=\"og:description\" content=\"Chief Technical Evangelist Pat Patterson discusses how you can leverage Apache Iceberg to efficiently store data in object stores like Backblaze B2 Cloud Storage.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/\" \/>\n<meta property=\"og:site_name\" content=\"Backblaze Blog | Cloud Storage &amp; Cloud Backup\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/backblaze\" \/>\n<meta property=\"article:published_time\" content=\"2025-05-01T17:29:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-12T21:02:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png\" \/>\n\t<meta property=\"og:image:width\" content=\"936\" \/>\n\t<meta property=\"og:image:height\" content=\"534\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Pat Patterson\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@backblaze\" \/>\n<meta name=\"twitter:site\" content=\"@backblaze\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pat Patterson\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Use DuckDB with Iceberg on Backblaze B2 Cloud Storage","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/","og_locale":"en_US","og_type":"article","og_title":"How to Use DuckDB with Iceberg on Backblaze B2 Cloud Storage","og_description":"Chief Technical Evangelist Pat Patterson discusses how you can leverage Apache Iceberg to efficiently store data in object stores like Backblaze B2 Cloud Storage.","og_url":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/","og_site_name":"Backblaze Blog | Cloud Storage &amp; Cloud Backup","article_publisher":"https:\/\/www.facebook.com\/backblaze","article_published_time":"2025-05-01T17:29:48+00:00","article_modified_time":"2025-12-12T21:02:29+00:00","og_image":[{"width":936,"height":534,"url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png","type":"image\/png"}],"author":"Pat Patterson","twitter_card":"summary_large_image","twitter_creator":"@backblaze","twitter_site":"@backblaze","twitter_misc":{"Written by":"Pat Patterson","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#article","isPartOf":{"@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/"},"author":{"name":"Pat Patterson","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/a724a8aee97b6451107442747cd101a4"},"headline":"Iceberg on Backblaze B2","datePublished":"2025-05-01T17:29:48+00:00","dateModified":"2025-12-12T21:02:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/"},"wordCount":2801,"commentCount":2,"publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png","keywords":["B2Cloud"],"articleSection":["Cloud Storage","Featured","Tech Lab"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/","url":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/","name":"How to Use DuckDB with Iceberg on Backblaze B2 Cloud Storage","isPartOf":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#primaryimage"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png","datePublished":"2025-05-01T17:29:48+00:00","dateModified":"2025-12-12T21:02:29+00:00","breadcrumb":{"@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#primaryimage","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png","width":936,"height":534,"caption":"A decorative image showing icons of different file types on a grid superimposed over a cloud."},{"@type":"BreadcrumbList","@id":"https:\/\/www.backblaze.com\/blog\/iceberg-on-backblaze-b2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Iceberg on Backblaze B2"}]},{"@type":"WebSite","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","name":"Backblaze Cloud Solutions Blog","description":"Cloud Storage &amp; Cloud Backup","publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization","name":"Backblaze","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","width":512,"height":512,"caption":"Backblaze"},"image":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/backblaze","https:\/\/x.com\/backblaze","https:\/\/www.youtube.com\/user\/Backblaze","https:\/\/en.wikipedia.org\/wiki\/Backblaze"]},{"@type":"Person","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/a724a8aee97b6451107442747cd101a4","name":"Pat Patterson","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2022\/01\/PatPatterson1920px-150x150.png","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2022\/01\/PatPatterson1920px-150x150.png","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2022\/01\/PatPatterson1920px-150x150.png","caption":"Pat Patterson"},"description":"Pat Patterson is the former chief technical evangelist at Backblaze. Over his three decades in the industry, Pat has built software and communities at Sun Microsystems, Salesforce, StreamSets, and Citrix. In his role at Backblaze, he creates and delivers content tailored to the needs of the hands-on technical professional, acts as the \u201cvoice of the developer\u201d on the Product team, and actively participates in the wider technical community. Outside the office, Pat runs far, having completed ultramarathons up to the 50 mile distance. Catch up with Pat via Bluesky or LinkedIn.","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/author\/pat\/"}]}},"jetpack_featured_media_url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2025\/05\/bb-header-native-code.png","_links":{"self":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/112071","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/users\/174"}],"replies":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/comments?post=112071"}],"version-history":[{"count":0,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/112071\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media\/112072"}],"wp:attachment":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media?parent=112071"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/categories?post=112071"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/tags?post=112071"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}