{"id":109959,"date":"2023-10-05T09:15:00","date_gmt":"2023-10-05T16:15:00","guid":{"rendered":"https:\/\/www.backblaze.com\/blog\/?p=109959"},"modified":"2024-08-14T12:01:15","modified_gmt":"2024-08-14T19:01:15","slug":"overload-to-overhaul-how-we-upgraded-drive-stats-data","status":"publish","type":"post","link":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/","title":{"rendered":"Overload to Overhaul: How We Upgraded Drive Stats Data"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"583\" src=\"\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data-1024x583.png\" alt=\"A decorative image showing the words &quot;overload to overhaul: how we upgraded Drive Stats data.&quot; \" class=\"wp-image-109960\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data-1024x583.png 1024w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data-300x171.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data-768x437.png 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data-560x319.png 560w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png 1440w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>This year, we\u2019re celebrating <a href=\"\/blog\/10-stories-from-10-years-of-drive-stats-data\/\" target=\"_blank\" rel=\"noreferrer noopener\">10 years of Drive Stats<\/a>. Coincidentally, we also made some upgrades to how we run our Drive Stats reports. We reported on how an attempt to migrate triggered a weeks-long recalculation of the dataset, leading us to map the architecture of the Drive Stats data.\u00a0<\/p>\n\n\n\n<p>This follow-up article focuses on the improvements we made after we fixed the existing bug (because hey, we were already in there), and then presents some of our ideas for future improvements. Remember that those are just <em>ideas<\/em> so far\u2014they may not be live in a month (or ever?), but consider them good food for thought, and know that we&#8217;re paying attention so that we can pass this info along to the right people.<\/p>\n\n\n\n<p>Now, onto the fun stuff.\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Quick Refresh: Drive Stats Data Architecture<\/h2>\n\n\n\n<p>The podstats generator runs on every <a href=\"\/blog\/the-storage-pod-story-innovation-to-commodity\/\" target=\"_blank\" rel=\"noreferrer noopener\">Storage Pod<\/a>, what we call any host that holds customer data, every few minutes. It\u2019s a C++ program that collects <a href=\"\/blog\/making-sense-of-ssd-smart-stats\/\" target=\"_blank\" rel=\"noreferrer noopener\">SMART stats<\/a> and a few other attributes, then converts them into an .xml file (\u201cpodstats\u201d). Those are then pushed to a central host in each datacenter and bundled. Once the data leaves these central hosts, it has entered the domain of what we will call <a href=\"https:\/\/www.backblaze.com\/cloud-storage\/resources\/hard-drive-test-data\" target=\"_blank\" rel=\"noreferrer noopener\">Drive Stats.<\/a>\u00a0\u00a0<\/p>\n\n\n\n<p>Now let\u2019s go into a little more detail: when you\u2019re gathering stats about drives, you\u2019re running a set of modules with dependencies to other modules, forming a data-dependency tree. Each time a module \u201cruns\u201d, it takes information, modifies it, and writes it to a disk. As you run each module, the data will be transformed sequentially. And, once a quarter, we run a special module that collects all the attributes for our Drive Stats reports, collecting data all the way down the tree.&nbsp;<\/p>\n\n\n\n<p>Here\u2019s a truncated diagram of the whole system, to give you an idea of what the logic looks like:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"302\" src=\"\/wp-content\/uploads\/2023\/09\/Drive-Stats-Data_Module-Logic.png\" alt=\"A diagram of the mapped logic of the Drive Stats modules.\" class=\"wp-image-109672\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/09\/Drive-Stats-Data_Module-Logic.png 936w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/09\/Drive-Stats-Data_Module-Logic-300x97.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/09\/Drive-Stats-Data_Module-Logic-768x248.png 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/09\/Drive-Stats-Data_Module-Logic-560x181.png 560w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><figcaption class=\"wp-element-caption\">An abbreviated logic map of Drive Stats modules. <\/figcaption><\/figure>\n\n\n\n<p>As you move down through the module layers, the logic gets more and more specialized. When you run a module, the first thing the module does is check in with the previous module to make sure the data exists and is current. It caches the data to disk at every step, and fills out the logic tree step by step. So for example, <code>drive_stats<\/code>, being a \u201cper-day\u201d module, will write out a file such as <code>\/data\/drive_stats\/2023-01-01.json.gz<\/code> when it finishes processing. This lets future modules read that file to avoid repeating work.<\/p>\n\n\n\n<p>This work deduplication process saves us a lot of time overall\u2014but it also turned out to be the root cause of our weeks-long process when we were migrating Drive Stats to our new host. We fixed that by implementing versions to each module.\u00a0\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">While You\u2019re There\u2026 Why Not Upgrade?<\/h2>\n\n\n\n<p>Once the dust from the bug fix had settled, we moved forward to try to modernize Drive Stats in general. Our daily report still ran quite slowly, on the order of several hours, and there was some low-hanging fruit to chase.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Waiting On You, <code>failures_with_stats<\/code><\/h3>\n\n\n\n<p>First things first, we saved a log of a run of our daily reports in <a href=\"https:\/\/www.jenkins.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jenkins<\/a>. Then we wrote an analyzer to see which modules were taking a lot of time. <code>failures_with_stats<\/code> was our biggest offender, running for about two hours, while every other module took about 15 minutes.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"926\" height=\"688\" src=\"\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times.png\" alt=\"An image showing runtimes for each module when running a Drive Stats report. \" class=\"wp-image-109963\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times.png 926w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-300x223.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-768x571.png 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-560x416.png 560w\" sizes=\"auto, (max-width: 926px) 100vw, 926px\" \/><figcaption class=\"wp-element-caption\">Not <em>quite<\/em> two hours.<\/figcaption><\/figure>\n\n\n\n<p>Upon investigation, the time cost had to do with how the <code>date_range<\/code> module works. This takes us back to caching: our module checks if the file has been written already, and if it has, it uses the cached file. However, a date range is written to a single file. That is, Drive Stats will recognize \u201cMonday to Wednesday\u201d as distinct from \u201cMonday to Thursday\u201d and re-calculate the entire range. This is a problem for a workload that is essentially doing work for all of time, every day.\u00a0\u00a0<\/p>\n\n\n\n<p>On top of this, the raw Drive Stats data, which is a dependency for <code>failures_with_stats<\/code>, would be gzipped onto a disk. When each new query triggered a request to recalculate all-time data, each dependency would pick up the podstats file from disk, decompress it, read it into memory, and do that for every day of all time. We were picking up and processing our biggest files every day, and time continued to make that cost larger.<\/p>\n\n\n\n<p>Our solution was what I called the \u201cDate Range Accumulator.\u201d It works as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If we have a date range like \u201call of time as of yesterday\u201d (or any partial range with the same start), consider it as a starting point.<\/li>\n\n\n\n<li>Make sure that the version numbers don\u2019t consider our starting point to be too old.<\/li>\n\n\n\n<li>Do the processing of today\u2019s data on top of our starting point to create \u201call of time as of today.\u201d<\/li>\n<\/ul>\n\n\n\n<p>To do this, we read the directory of the date range accumulator, find the \u201clatest\u201d valid one, and use that to determine the delta (change) to our current date. Basically, the module says: \u201cThe last time I ran this was on data from the beginning of time to Thursday. It\u2019s now Friday. I need to run the process for Friday, and then add that to the compiled all-time.\u201d And, before it does that, it double checks the version number to avoid errors. (As we noted in our previous article, if it doesn\u2019t see the correct version number, instead of inefficiently running all data, it just tells you there is a version number discrepancy.)&nbsp;<\/p>\n\n\n\n<p>The code is also a bit finicky\u2014there are lots of snags when it comes to things like defining exceptions, such as if we took a drive out of the fleet, but it wasn\u2019t a true failure. The module also needed to be processable day by day to be usable with this technique.<\/p>\n\n\n\n<p>Still, even with all the tweaks, it\u2019s massively better from a runtime perspective for eligible candidates. Here\u2019s our new <code>failures_with_stats<\/code> runtime:\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"366\" src=\"\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-2.png\" alt=\"An output of module runtime after the Drive Stats improvements were made. \" class=\"wp-image-109962\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-2.png 936w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-2-300x117.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-2-768x300.png 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Module-times-2-560x219.png 560w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><figcaption class=\"wp-element-caption\">Ahh, sweet victory. <\/figcaption><\/figure>\n\n\n\n<p>Note that in this example, we\u2019re running that 60-day report. The daily report is quite a bit quicker. But, at least the 60-day report is a fixed amount of time (as compared with the all-time dataset, which is continually growing).\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Code Upgrade to Python 3<\/h3>\n\n\n\n<p>Next, we converted our code to Python 3. (Shout out to our intern, Anath, who did amazing work on this part of the project!) We didn\u2019t make this improvement just to make it; no, we did this because I wanted faster JSON processors, and a lot of the more advanced ones did not work with Python 2. When we looked at the time each module took to process, most of that was spent serializing and deserializing JSON.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">What Is JSON Parsing?<\/h4>\n\n\n\n<p>JSON is an open standard file format that uses human readable text to store and transmit data objects. Many modern programming languages include code to generate and parse JSON-format data. Here\u2019s how you might describe a person named John, aged 30, from New York using JSON:\u00a0<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">{ \n\u201cfirstName\u201d: \u201cJohn\u201d, \n\u201cage\u201d: 30,\n\u201cState\u201d: \u201cNew York\u201d\n}<\/pre>\n\n\n\n<p>You can express those attributes into a single line of code and define them as a native object:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">x = { 'name':'John', 'age':30, 'city':'New York'}<\/pre>\n\n\n\n<p>\u201cParsing\u201d is the process by which you take the JSON data and make it into an object that you can plug into another programming language. You\u2019d write your script (program) in Python, it would parse (interpret) the JSON data, and then give you an answer. This is what that would look like:\u00a0<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import json\n\n# some JSON:\nx = '''\n{ \n\t\"firstName\": \"John\", \n\t\"age\": 30,\n\t\"State\": \"New York\"\n}\n'''\n\n# parse x:\ny = json.loads(x)\n\n# the result is a Python object:\nprint(y[\"name\"])\n<\/pre>\n\n\n\n<p>If you run this script, you\u2019ll get the output \u201cJohn.\u201d If you change <code>print(y[\"name\"])<\/code> to <code>print(y[\"age\"])<\/code>, you\u2019ll get the output \u201c30.\u201d Check out <a href=\"https:\/\/www.w3schools.com\/python\/trypython.asp?filename=demo_json\" target=\"_blank\" rel=\"noreferrer noopener\">this website<\/a> if you want to interact with the code for yourself. In practice, the JSON would be read from a database, or a web API, or a file on disk rather than defined as a \u201cstring\u201d (or text) in the Python code. If you are converting a lot of this JSON, small improvements in efficiency can make a big difference in how a program performs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">And Implementing UltraJSON<\/h3>\n\n\n\n<p>Upgrading to Python 3 meant we could use <a href=\"https:\/\/github.com\/ultrajson\/ultrajson\" target=\"_blank\" rel=\"noreferrer noopener\">UltraJSON<\/a>. This was approximately 50% faster than the built-in Python JSON library we used previously.\u00a0<\/p>\n\n\n\n<p>We also looked at the XML parsing for the podstats files, since XML parsing is often a slow process. In this case, we actually found our existing tool is pretty fast (and since we wrote it 10 years ago, that\u2019s pretty cool). Off-the-shelf XML parsers take quite a bit longer because they care about a lot of things we don\u2019t have to: our tool is customized for our Drive Stats needs. It\u2019s a well known adage that you should not parse XML with regular expressions, but if your files are, well, very regular, it can save a lot of time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Does the Future Hold?<\/h2>\n\n\n\n<p>Now that we\u2019re working with a significantly faster processing time for our Drive Stats dataset, we\u2019ve got some ideas about upgrades in the future. Some of these are easier to achieve than others. Here\u2019s a sneak peek of some potential additions and changes in the future.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data on Data<\/h3>\n\n\n\n<p>In keeping with our data-nerd ways, I got curious about how much the Drive Stats dataset is growing and if the trend is linear. We made this graph, which shows the baseline rolling average, and has a trend line that attempts to predict linearly.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"378\" src=\"\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Growth-rate-of-data.png\" alt=\"A graph showing the rate at which the Drive Stats dataset has grown over time. \" class=\"wp-image-109961\" srcset=\"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Growth-rate-of-data.png 936w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Growth-rate-of-data-300x121.png 300w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Growth-rate-of-data-768x310.png 768w, https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/Drive-Stats-Improvements_2_Growth-rate-of-data-560x226.png 560w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/figure>\n\n\n\n<p>I envision this graph living somewhere on the <a href=\"https:\/\/www.backblaze.com\/cloud-storage\/resources\/hard-drive-test-data\" target=\"_blank\" rel=\"noreferrer noopener\">Drive Stats<\/a> page and being fully interactive. It\u2019s just one graph, but this and similar tools available on our website would be 1) fun and 2) lead to some interesting insights for those who don\u2019t dig in line by line.\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What About Changing the Data Module?<\/h3>\n\n\n\n<p>The way our current module system works, everything gets processed in a tree approach, and they\u2019re flat files. If we used something like <a href=\"https:\/\/www.sqlite.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">SQLite<\/a> or <a href=\"https:\/\/parquet.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Parquet<\/a>, we\u2019d be able to process data in a more depth-first way, and that would mean that we could open a file for one module or data range, process everything, and not have to read the file again.\u00a0<\/p>\n\n\n\n<p>And, since one of the first things that our Drive Stats expert, <a href=\"\/blog\/author\/andy\/\" target=\"_blank\" rel=\"noreferrer noopener\">Andy Klein<\/a>, does with our .xml data is to convert it to SQLite, outputting it in a queryable form would save a lot of time.\u00a0<\/p>\n\n\n\n<p>We could also explore keeping the data as a less-smart filetype, but using something more compact than JSON, such as <a href=\"https:\/\/msgpack.org\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">MessagePack<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can We Improve Failure Tracking and Attribution?<\/h3>\n\n\n\n<p>One of the odd things about our Drive Stats datasets is that they don\u2019t always and automatically agree with our internal data lake. Our Drive Stats outputs have some wonkiness that\u2019s hard to replicate, and it\u2019s mostly because of exceptions we build into the dataset. These exceptions aren\u2019t when a drive fails, but rather when we\u2019ve removed it from the fleet for some other reason, like if we were testing a drive or something along those lines. (You can see specific callouts in Drive Stats reports, if you\u2019re interested.) It\u2019s also where a lot of Andy\u2019s manual work on Drive Stats data comes in each month: he\u2019s often comparing the module\u2019s output with data in our datacenter ticket tracker.<\/p>\n\n\n\n<p>These tickets come from the awesome data techs working in our data centers. Each time a drive fails and they have to replace it, our techs add a reason for why it was removed from the fleet. While not all drive replacements are \u201cfailures\u201d, adding a root cause to our Drive Stats dataset would give us more confidence in our failure reporting (and would save Andy comparing the two lists).\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Result: Faster Drive Stats and Future Fun<\/h2>\n\n\n\n<p>These two improvements (the date range accumulator and upgrading to Python 3) resulted in hours, and maybe even days, of work saved. Even from a troubleshooting point of view, we often wouldn&#8217;t know if the process was stuck, or if this was the normal amount of time the module should take to run. Now, if it takes more than about 15 minutes to run a report, you\u2019re sure there\u2019s a problem.\u00a0<\/p>\n\n\n\n<p>While the Drive Stats dataset can\u2019t really be called \u201cbig data\u201d, it provides a good, concrete example of scaling with your data. We\u2019ve been collecting Drive Stats for just over 10 years now, and even though most of the code written way back when is inherently sound, small improvements that seem marginal become amplified as datasets grow.&nbsp;<\/p>\n\n\n\n<p>Now that we\u2019ve got better documentation of how everything works, it\u2019s going to be easier to keep Drive Stats up-to-date with the best tools and run with future improvements.\u00a0Let us know in the comments what you&#8217;d be interested in seeing. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Catch part two of Sr. Software Infrastructure Engineer David Winings&#8217; Drive Stats data journey, where he upgrades data collection and shares his ideas about future improvements. <\/p>\n","protected":false},"author":195,"featured_media":109960,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":"","jetpack_post_was_ever_published":false},"categories":[7,434,457],"tags":[468],"class_list":["post-109959","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-storage","category-featured-1","category-hard-drive-stats","tag-b2cloud","entry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Overload to Overhaul: How We Upgraded Drive Stats Data<\/title>\n<meta name=\"description\" content=\"Celebrate a decade of Drive Stats as we unveil improvements and future ideas for optimizing data reporting and enhancing performance.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Overload to Overhaul: How We Upgraded Drive Stats Data\" \/>\n<meta property=\"og:description\" content=\"Celebrate a decade of Drive Stats as we unveil improvements and future ideas for optimizing data reporting and enhancing performance.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/\" \/>\n<meta property=\"og:site_name\" content=\"Backblaze Blog | Cloud Storage &amp; Cloud Backup\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/backblaze\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-05T16:15:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-08-14T19:01:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"820\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"David Winings\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@backblaze\" \/>\n<meta name=\"twitter:site\" content=\"@backblaze\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"David Winings\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Overload to Overhaul: How We Upgraded Drive Stats Data","description":"Celebrate a decade of Drive Stats as we unveil improvements and future ideas for optimizing data reporting and enhancing performance.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/","og_locale":"en_US","og_type":"article","og_title":"Overload to Overhaul: How We Upgraded Drive Stats Data","og_description":"Celebrate a decade of Drive Stats as we unveil improvements and future ideas for optimizing data reporting and enhancing performance.","og_url":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/","og_site_name":"Backblaze Blog | Cloud Storage &amp; Cloud Backup","article_publisher":"https:\/\/www.facebook.com\/backblaze","article_published_time":"2023-10-05T16:15:00+00:00","article_modified_time":"2024-08-14T19:01:15+00:00","og_image":[{"width":1440,"height":820,"url":"https:\/\/www.backblaze.com\/blog\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png","type":"image\/png"}],"author":"David Winings","twitter_card":"summary_large_image","twitter_creator":"@backblaze","twitter_site":"@backblaze","twitter_misc":{"Written by":"David Winings","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#article","isPartOf":{"@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/"},"author":{"name":"David Winings","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/8e284f21150e5b794b04b4b616e98772"},"headline":"Overload to Overhaul: How We Upgraded Drive Stats Data","datePublished":"2023-10-05T16:15:00+00:00","dateModified":"2024-08-14T19:01:15+00:00","mainEntityOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/"},"wordCount":2126,"commentCount":0,"publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png","keywords":["B2Cloud"],"articleSection":["Cloud Storage","Featured","Hard Drive Stats"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/","url":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/","name":"Overload to Overhaul: How We Upgraded Drive Stats Data","isPartOf":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#primaryimage"},"image":{"@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#primaryimage"},"thumbnailUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png","datePublished":"2023-10-05T16:15:00+00:00","dateModified":"2024-08-14T19:01:15+00:00","description":"Celebrate a decade of Drive Stats as we unveil improvements and future ideas for optimizing data reporting and enhancing performance.","breadcrumb":{"@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#primaryimage","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png","width":1440,"height":820,"caption":"A decorative image showing the words \"overload to overhaul: how we upgraded Drive Stats data.\""},{"@type":"BreadcrumbList","@id":"https:\/\/www.backblaze.com\/blog\/overload-to-overhaul-how-we-upgraded-drive-stats-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Overload to Overhaul: How We Upgraded Drive Stats Data"}]},{"@type":"WebSite","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#website","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","name":"Backblaze Cloud Solutions Blog","description":"Cloud Storage &amp; Cloud Backup","publisher":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#organization","name":"Backblaze","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.backblaze.com\/blog\/wp-content\/uploads\/2017\/12\/backblaze_icon_transparent.png?fit=512%2C512&ssl=1","width":512,"height":512,"caption":"Backblaze"},"image":{"@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/backblaze","https:\/\/x.com\/backblaze","https:\/\/www.youtube.com\/user\/Backblaze","https:\/\/en.wikipedia.org\/wiki\/Backblaze"]},{"@type":"Person","@id":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/#\/schema\/person\/8e284f21150e5b794b04b4b616e98772","name":"David Winings","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/09\/Backblaze-Author_David-Winings-scaled-e1694047193400-150x150.jpeg","url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/09\/Backblaze-Author_David-Winings-scaled-e1694047193400-150x150.jpeg","contentUrl":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/09\/Backblaze-Author_David-Winings-scaled-e1694047193400-150x150.jpeg","caption":"David Winings"},"description":"David Winings is a Sr. Infrastructure Software Engineer at Backblaze. He has a diverse technical background, working across all things infrastructure and data. At heart, he is still a teenager trying to fix his Wi-Fi driver on Ubuntu. He is a northern Virginia native but now lives in Sacramento, CA, exploring the wonders of both technology and dog ownership. Connect with him on GitHub.","url":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/author\/david-winings\/"}]}},"jetpack_featured_media_url":"https:\/\/backblazeprod.wpenginepowered.com\/wp-content\/uploads\/2023\/10\/bb-bh-Overload-to-Overhaul-How-We-Upgraded-Drive-Stats-Data.png","_links":{"self":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/109959","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/users\/195"}],"replies":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/comments?post=109959"}],"version-history":[{"count":0,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/posts\/109959\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media\/109960"}],"wp:attachment":[{"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/media?parent=109959"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/categories?post=109959"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/backblazeprod.wpenginepowered.com\/blog\/wp-json\/wp\/v2\/tags?post=109959"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}