{"id":94211,"date":"2020-03-05T09:14:48","date_gmt":"2020-03-05T17:14:48","guid":{"rendered":"https:\/\/www.backblaze.com\/blog\/?p=94211"},"modified":"2025-12-12T14:17:30","modified_gmt":"2025-12-12T22:17:30","slug":"data-warehouses-data-lakes-and-data-swamps","status":"publish","type":"post","link":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/","title":{"rendered":"The Geography of Big Data Maintenance: Data Warehouses, Data Lakes, and Data Swamps"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-94242 size-full\" title=\"Data Warehouses, Data Lakes, and Data Swamps\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg\" alt=\"Big Data illustration \" width=\"1182\" height=\"673\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg 1182w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header-300x171.jpeg 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header-1024x583.jpeg 1024w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header-768x437.jpeg 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header-560x319.jpeg 560w\" sizes=\"auto, (max-width: 1182px) 100vw, 1182px\" \/><\/p>\n<div class=\"abstract\">\n<p>&#8220;What is Cloud Storage?&#8221; is a series of posts for business leaders and entrepreneurs interested in using the cloud to scale their business without wasting millions of capital on infrastructure. Despite being relatively simple, information about \u201cthe Cloud\u201d is overrun with frustratingly unclear jargon. These guides aim to cut through the hype and give you the information you need to convince stakeholders that scaling your business in the cloud is an essential next step. We hope you find them useful, and will let us know what additional insight you might need.&#8221; <span style=\"display: block; margin-right: 5%; text-align: right;\">&#8211;The Editors<\/span><\/p>\n<p><strong><em>What is Cloud Storage?<\/em><\/strong><\/p>\n<ul>\n<li><a href=\"\/blog\/a-sandbox-in-the-clouds-software-testing-and-development-in-cloud-storage\/\" target=\"_blank\" rel=\"noopener noreferrer\">A Sandbox in the Clouds: Software Testing and Development in Cloud Storage<\/a><\/li>\n<li><a href=\"\/blog\/data-warehouses-data-lakes-and-data-swamps\/\" target=\"_blank\" rel=\"noopener noreferrer\">The Geography of Big Data Maintenance: Data Warehouses, Data Lakes, and Data Swamps<\/a><\/li>\n<li><a href=\"\/blog\/object-file-block-storage-guide\/\" target=\"_blank\" rel=\"noopener noreferrer\">A Guide to Clouds: Object, File, and Block<\/a><\/li>\n<\/ul>\n<\/div>\n<p id=\"bzdropcap\">\u201cBig Data\u201d is a phrase people love to throw around in advertising and planning documents, despite the fact that the term itself is rarely defined the same way by any two businesses, even among industry leaders. However, everyone can agree about its rapidly growing importance\u2014understanding Big Data and how to leverage it for the greatest value will be of critical organizational concern for the foreseeable future.<\/p>\n<p>So then what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? What makes it so big, anyway? Let\u2019s bring Big Data down to size.<\/p>\n<h2><strong>What is Big Data?<\/strong><\/h2>\n<p>First things first, for purposes of this discussion, \u201cBig\u201d means any amount of data that exceeds the storage capacity of a single organization. \u201cData\u201d refers to information stored or processed on a computer. Collectively, then, \u201cBig Data\u201d is a massive volume of both structured or unstructured (or both) data that is too large to effectively process using traditional relational database management systems or applications. In more general terms, when your infrastructure is too small to handle the data your business is generating\u2014either because the volume of data is too large, it moves too fast, or it simply exceeds the current processing capacity of your systems\u2014you\u2019ve entered the realm of Big Data.<\/p>\n<p>Let\u2019s take a look at the defining characteristics.<\/p>\n<h3><strong>Characteristics of Big Data<\/strong><\/h3>\n<p>Current definitions of Big Data often reference a \u201ctriple (or in some cases quadruple) V\u201d construct for detailing its characteristics. The \u201cV\u201ds reference velocity, volume, variety, and variability. We\u2019ll define them for you here:<\/p>\n<h4><strong>Velocity<\/strong><\/h4>\n<p>Velocity refers to the speed of generation of the data\u2014the pace at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, mobile devices, etc. This speed determines how rapidly data must be processed to meet business demands, which determines the real potential for the data.<\/p>\n<h4><strong>Volume<\/strong><\/h4>\n<p>The term Big Data itself obviously references significant volume. But beyond just being \u201cbig,\u201d the relative size of a data set is a fundamental factor in determining its value. The volume of data stored by an organization is used to ascertain its scalability, accessibility, and ease or difficulty of management. A few examples of high volume data sets are all of the credit card transactions in the United States on a given day; the entire collection of medical records in Europe; and every video uploaded to YouTube in an hour. A small to moderate volume might be the total number of credit card transactions in your business.<\/p>\n<h4><strong>Variety<\/strong><\/h4>\n<p>Variety refers to how many disparate or separate data sources contribute to an organization\u2019s Big Data, along with the intrinsic nature of the data coming from each source. This relates to both structured and unstructured data. Years ago, spreadsheets and databases were the primary sources of data handled by the majority of applications. Today, data is generated in a multitude of formats such as email, photos, videos, monitoring devices, PDFs, audio, etc.,\u2014all of which demand different considerations in analysis applications. This variety of formats can potentially create issues for storage, mining, and analyzing data.<\/p>\n<h4><strong>Variability<\/strong><\/h4>\n<p>This concerns any inconsistencies in the data formats coming from any one source. Where variety considers different inputs from different sources, variability considers different inputs from one data source. These differences can complicate the effective management of the data store. Variability may also refer to differences in the speed of the data flow into your <a href=\"https:\/\/www.backblaze.com\/cloud-storage\" target=\"_blank\" rel=\"noopener noreferrer\">storage systems<\/a>. Where velocity refers to the speed of all of your data, variability refers to how different data sets might move at different speeds. Variability can be a concern when the data itself has inconsistencies despite the architecture remaining constant.<\/p>\n<p>An example from the health sector would be the variances within influenza epidemics (when and where they happen, how they\u2019re reported in different health systems) and vaccinations (where they are\/aren\u2019t available) from year to year.<\/p>\n<p>Understanding the makeup of Big Data in terms of Velocity, Volume, Variety, and Variability is key when strategizing big data solutions. This fundamental terminology will help you to effectively communicate among all players involved in decision making when you bring Big Data solutions to your team or your wider business. Whether pitching solutions, engaging consultants or vendors, or hearing out the proposals of the IT group, a shared terminology is crucial.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-94244\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2020\/03\/data-lakes-banner-1.jpeg\" alt=\"What is Big Data?\" width=\"1182\" height=\"173\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-1.jpeg 1182w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-1-300x44.jpeg 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-1-1024x150.jpeg 1024w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-1-768x112.jpeg 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-1-560x82.jpeg 560w\" sizes=\"auto, (max-width: 1182px) 100vw, 1182px\" \/><\/p>\n<h2><strong>What is Big Data Used For?<\/strong><\/h2>\n<p>Businesses use Big Data to try to predict future customer behavior based on past patterns and trends. Effective predictive analytics are the metaphorical crystal ball that organizations seek about what their customers want and when they want it. Theoretically, the more data collected, the more patterns and trends the business can identify. This information can potentially make all the difference for a successful strategy in customer acquisition and retention, and create loyal advocates for a business.<\/p>\n<p>In this case, bigger is definitely better! But, the method an organization chooses to address its Big Data needs will be a pivotal marker for success in the coming years. Choosing your approach begins with understanding the sources of your data.<\/p>\n<h2><strong>Sources of Big Data<\/strong><\/h2>\n<p>Today\u2019s world is incontestably digital: an endless array of gadgets and devices function as our trusted allies on a daily basis. While helpful, these constant companions are also responsible for generating more and more data every day. Smartphones, GPS technology, social media, surveillance cameras, machine sensors (and the growing number of users behind them) are all producing reams of data on a moment-to-moment basis that has increased exponentially, from 1 Zetabyte of customer data produced in 2009 to more than 35 Zetabytes in 2020.<\/p>\n<p>If your business uses an app to receive and process orders for customers, or if you log extensive point-of-sale retail data, or if you have massive email marketing campaigns, you could have sources for untapped insight into your customers.<\/p>\n<p>Once you understand the sources of your data, the next step is understanding the methods for housing and managing it. Data Warehouses and Data Lakes are two of the primary types of storage and maintenance systems that you should be familiar with.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-94245 size-full\" title=\"Data Warehouses are a Primary Type of Data Storage\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2020\/03\/data-lakes-banner-2.jpeg\" alt=\"illustration of multiple server stacks\" width=\"1182\" height=\"173\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-2.jpeg 1182w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-2-300x44.jpeg 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-2-1024x150.jpeg 1024w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-2-768x112.jpeg 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-2-560x82.jpeg 560w\" sizes=\"auto, (max-width: 1182px) 100vw, 1182px\" \/><\/p>\n<h2><strong>Where Is Big Data Stored? Data Warehouses &amp; Data Lakes<\/strong><\/h2>\n<p>Although both Data Lakes and Data Warehouses are widely used for Big Data storage they are not interchangeable terms.<\/p>\n<p>A <strong>Data Warehouse<\/strong> is an electronic system used to organize information. A Data Warehouse goes beyond the capabilities of a traditional relational database\u2019s function of housing and organizing data generated from a single source only.<\/p>\n<h3><strong>How Do Data Warehouses Work?<\/strong><\/h3>\n<p>A Data Warehouse is a repository for structured, filtered data that has already been processed for a specific purpose. A warehouse combines information from multiple sources into a single comprehensive database.<\/p>\n<p>For example, in the retail world, a data warehouse may consolidate customer info from point-of-sale systems, the company website, consumer comment cards, and mailing lists. This information can then be used for distribution and marketing purposes, to track inventory movements, customer buying habits, manage promotions, and to determine pricing policies.<\/p>\n<p>Additionally, the Data Warehouse may also incorporate information about company employees such as demographic data, salaries, schedules, and so on. This type of information can be used to inform hiring practices, set Human Resources policies and help guide other internal practices.<\/p>\n<p>Data Warehouses are fundamental in the efficiency of modern life. For instance:<\/p>\n<h4><strong>Have a plane to catch?<\/strong><\/h4>\n<p>Airline systems rely on Data Warehouses for many operational functions like route analysis, crew assignments, frequent flyer programs, and more.<\/p>\n<h4><strong>Have a headache?<\/strong><\/h4>\n<p>The healthcare sector uses Data Warehouses to aid organizational strategy, help predict patient outcomes, generate treatment reports, and cross-share information with insurance companies, medical aid services, and so forth.<\/p>\n<h4><strong>Are you a solid citizen?<\/strong><\/h4>\n<p>In the public sector, Data Warehouses are mainly used for gathering intelligence and assisting government agencies in maintaining and analyzing individual tax and health records.<\/p>\n<h4><strong>Playing it safe?<\/strong><\/h4>\n<p>In investment and insurance sectors, the warehouses are mainly used to detect and analyze data patterns reflecting customer trends, and to continuously track market fluctuations.<\/p>\n<h4><strong>Have a call to make?<\/strong><\/h4>\n<p>The telecommunications industry makes use of Data Warehouses for management of product promotions, to drive sales strategies, and to make distribution decisions.<\/p>\n<h4><strong>Need a room for the night?<\/strong><\/h4>\n<p>The hospitality industry utilizes Data Warehouse capabilities in the tailored design and cost-effective implementation of advertising and marketing programs targeted to reflect client feedback and travel habits.<\/p>\n<p>Data Warehouses are integral in many aspects of the business of everyday life. That said, they aren\u2019t capable of handling the inflow of data in its raw format, like object files or blobs. A Data Lake is the type of repository needed to make use of this raw data. Let\u2019s examine Data Lakes next.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-94246 size-full\" title=\"What is a Data Lake?\" src=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2020\/03\/data-lakes-banner-3.jpeg\" alt=\"Data lake illustration\" width=\"1182\" height=\"173\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-3.jpeg 1182w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-3-300x44.jpeg 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-3-1024x150.jpeg 1024w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-3-768x112.jpeg 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-banner-3-560x82.jpeg 560w\" sizes=\"auto, (max-width: 1182px) 100vw, 1182px\" \/><\/p>\n<h3><strong>What is a Data Lake?<\/strong><\/h3>\n<p>A Data Lake is a vast pool of raw data, the purpose for which is not yet defined. This data can be both structured and unstructured. The prime attributes of a Data Lake are a secure and adaptable data storage and maintenance system distinguished by its flexibility, agility, and ease of use.<\/p>\n<p>If you\u2019re considering a business approach that involves Data Lakes, you\u2019ll want to look for solutions that have the following characteristics: they should retain all data and support all data types; they should easily adapt to change; and they should provide quick insights to as wide a range of users as you require.<\/p>\n<h4><strong>Use Cases for Data Lakes<\/strong><\/h4>\n<p>Data Lakes are most helpful when working with streaming data, like the sorts of information gathered from machine sensors, live event-based data streams, clickstream tracking, or product\/server logs.<\/p>\n<p>Deployments of Data Lakes typically address one or more of the following business use cases:<\/p>\n<ul>\n<li><strong>Business intelligence and analytics<\/strong> &#8211; analyzing streams of data to determine high-level trends and granular, record-level insights. A good example of this is the oil and gas industry, which has used the nearly 1.5 Terabytes of data they generate on a daily basis to increase their efficiency.<\/li>\n<li><strong>Data science<\/strong> &#8211; unstructured data allows for more possibilities in analysis and exploration, enabling innovative applications of machine learning, advanced statistics and predictive algorithms. State, city, and federal governments around the world are using data science to dig more deeply into the massive amount of data they collect regarding traffic, utilities, and pedestrian behavior to design safer, smarter cities.<\/li>\n<\/ul>\n<ul>\n<li><strong>Data serving<\/strong> &#8211; Data Lakes are usually an integral part of high-performance architectures for applications that rely on fresh or real-time data, including recommender systems, predictive decision engines or fraud detection tools. A good example of this use case are the different Customer Data Platforms available that pull information from many behavioral and transactional data sources to highly refine and target marketing to individual customers.<\/li>\n<\/ul>\n<p>When considered together, the different potential applications for Data Lakes in your business seem to promise an endless source of revolutionary insights. But the ongoing maintenance and technical upgrades required for these data sources to retain relevance and value is massive. If neglected or mismanaged, Data Lakes quickly devolve. As such, one of the biggest considerations to weigh when considering this approach is whether you have the <a href=\"\/blog\/calculate-cost-cloud-storage\/\" target=\"_blank\" rel=\"noopener noreferrer\">financial and personnel capacity to manage Data Lakes<\/a> over the long term.<\/p>\n<h3><strong>What is a Data Swamp?<\/strong><\/h3>\n<p>A Data Swamp, put simply, is a Data Lake that no one cared to manage appropriately. They arise when a Data Lake is being treated as storage only, with a lack of curation, management, retention and lifecycle policies, and metadata. And if you decided to work Data Lake derived insights into your business planning, and end up with a Swamp, you are going to be sorely disappointed. You\u2019re paying the same amount to store all of your data, but returning zero effective intelligence to your bottom line.<\/p>\n<h2><strong>Final Thoughts on Big Data Maintenance<\/strong><\/h2>\n<p>Any business or organization considering entry into Big Data country will want to be very careful and planful as they consider how they will store, maintain, and analyze their data. Making the right choices at the outset will ensure you\u2019re able to traverse the developing digital landscape with strategic insights that enable informed decisions to keep you ahead of your competitors. We hope this primer on Big Data gives you the confidence to take the appropriate first steps.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Understanding what &#8220;Big Data&#8221; is and how to leverage it can make a huge difference for any business. This post explores Big Data as a concept, including what defines Data Warehouses and Data Lakes, and how to avoid Data Swamps.<\/p>\n","protected":false},"author":144,"featured_media":94242,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":"","jetpack_post_was_ever_published":false},"categories":[7],"tags":[468],"class_list":["post-94211","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-storage","tag-b2cloud","entry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>What is the Definition of Big Data and a Data Warehouse?<\/title>\n<meta name=\"description\" content=\"So what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? Let\u2019s bring Big Data down to size.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is the Definition of Big Data and a Data Warehouse?\" \/>\n<meta property=\"og:description\" content=\"So what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? Let\u2019s bring Big Data down to size.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/\" \/>\n<meta property=\"og:site_name\" content=\"Backblaze Blog | Cloud Storage &amp; Cloud Backup\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/backblaze\" \/>\n<meta property=\"article:published_time\" content=\"2020-03-05T17:14:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-12T22:17:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1182\" \/>\n\t<meta property=\"og:image:height\" content=\"673\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Patrick Thomas\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@backblaze\" \/>\n<meta name=\"twitter:site\" content=\"@backblaze\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Patrick Thomas\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is the Definition of Big Data and a Data Warehouse?","description":"So what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? Let\u2019s bring Big Data down to size.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/","og_locale":"en_US","og_type":"article","og_title":"What is the Definition of Big Data and a Data Warehouse?","og_description":"So what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? Let\u2019s bring Big Data down to size.","og_url":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/","og_site_name":"Backblaze Blog | Cloud Storage &amp; Cloud Backup","article_publisher":"https:\/\/www.facebook.com\/backblaze","article_published_time":"2020-03-05T17:14:48+00:00","article_modified_time":"2025-12-12T22:17:30+00:00","og_image":[{"width":1182,"height":673,"url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg","type":"image\/jpeg"}],"author":"Patrick Thomas","twitter_card":"summary_large_image","twitter_creator":"@backblaze","twitter_site":"@backblaze","twitter_misc":{"Written by":"Patrick Thomas","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#article","isPartOf":{"@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/"},"author":{"name":"Patrick Thomas","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/7939165675bc36f0862dbe0b25d3657f"},"headline":"The Geography of Big Data Maintenance: Data Warehouses, Data Lakes, and Data Swamps","datePublished":"2020-03-05T17:14:48+00:00","dateModified":"2025-12-12T22:17:30+00:00","mainEntityOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/"},"wordCount":2190,"commentCount":2,"publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg","keywords":["B2Cloud"],"articleSection":["Cloud Storage"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/","url":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/","name":"What is the Definition of Big Data and a Data Warehouse?","isPartOf":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#primaryimage"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg","datePublished":"2020-03-05T17:14:48+00:00","dateModified":"2025-12-12T22:17:30+00:00","description":"So what does Big Data really mean? Who is it for? Where does it come from? Where is it stored? Let\u2019s bring Big Data down to size.","breadcrumb":{"@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#primaryimage","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg","width":1182,"height":673,"caption":"Illustration of Data Warehouses, Data Lakes, and Data Swamps"},{"@type":"BreadcrumbList","@id":"https:\/\/www.backblaze.com\/blog\/data-warehouses-data-lakes-and-data-swamps\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/"},{"@type":"ListItem","position":2,"name":"The Geography of Big Data Maintenance: Data Warehouses, Data Lakes, and Data Swamps"}]},{"@type":"WebSite","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","name":"Backblaze Cloud Solutions Blog","description":"Cloud Storage &amp; Cloud Backup","publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization","name":"Backblaze","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","width":512,"height":512,"caption":"Backblaze"},"image":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/backblaze","https:\/\/x.com\/backblaze","https:\/\/www.youtube.com\/user\/Backblaze","https:\/\/en.wikipedia.org\/wiki\/Backblaze"]},{"@type":"Person","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/7939165675bc36f0862dbe0b25d3657f","name":"Patrick Thomas","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2019\/09\/patrick_thomas-e1569451539653-150x150.png","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2019\/09\/patrick_thomas-e1569451539653-150x150.png","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2019\/09\/patrick_thomas-e1569451539653-150x150.png","caption":"Patrick Thomas"},"description":"Patrick Thomas is the Vice President of Marketing at Backblaze. He has managed all aspects of content development and strategy across a number of industries, including literary publishing, gaming, and tech. He has developed and edited New York Times bestsellers and Wall Street Journal books of the year, and written for National Geographic and the San Francisco Chronicle. He loves nothing more than learning, and Backblaze\u2019s steady beat of innovation feeds that love every day. LinkedIn: Patrick Thomas.","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/author\/patrick\/"}]}},"jetpack_featured_media_url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2020\/03\/data-lakes-header.jpeg","_links":{"self":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/94211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/users\/144"}],"replies":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/comments?post=94211"}],"version-history":[{"count":0,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/94211\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media\/94242"}],"wp:attachment":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media?parent=94211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/categories?post=94211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/tags?post=94211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}