Confessions Of A Digital Pack Rat: Almost Half A Petabyte And Still Growing

May 2nd, 2017

Retired rack server

What do you do when you have almost half a petabyte (PB) of data? That’s the situation in which Michael Oskierko finds himself. He’s a self-proclaimed digital pack rat who’s amassed more than 390 terabytes (TB) total, and it’s continuing to grow.

Based in Texas, Michael Oskierko is a financial analyst by day. But he’s set up one of the biggest personal data warehouses we’ve seen. The Oskierko family has a huge collection of photos, videos, documents and more – much more than most of us. Heck, more data than many companies have.

How Did It Get Like This?

“There was a moment when we were pregnant with our second child,” Michael explained. “I guess it was a nesting instinct. I was looking at pictures of our first child and played them back on a 4K monitor. It was grainy and choppy.”

Disappointed with the quality of those early images, he vowed to store future memories in a pristine state. “I got a DSLR that took great pictures and saved everything in RAW format. That’s about 30 MB per image right there.”

Michael says he now has close to 1 million photos (from many different devices, not just the DSLR) and about 200,000 videos stored in their original formats. Michael says that video footage from his drone alone occupies about 300 GB.

The Oskierkos are also avid music listeners: iTunes counts 707 days’ worth of music in their library at present. Michael keeps Green Day’s entire library on heavy rotation, with a lot of other alternative rock a few clicks away. His wife’s musical tastes are quite broad, ranging from rap to gospel. They’re also avid audiobook listeners, and it all adds up: Dozens more TB of shared storage space dedicated to audio files.

What’s more, he’s kept very careful digital records of stuff that otherwise might have gotten tossed to the curbside years ago. “I have every single note, test, project, and assignment from 7th grade through graduate school scanned and archived,” he tells us. He’s even scanned his textbooks from high school and college!

“I started cutting these up and scanning the pages before the nifty ‘Scan to PDF’ was a real widespread option and duplexing scanners were expensive,” he said.

One of the biggest uses of space isn’t something that Michael needs constant access to, but he’s happy to have when the need arises. As a hobbyist programmer who works in multiple languages and on different platforms, Michael maintains a library of uncompressed disk images (ISOs) which he uses as needed.

When you have this much storage, it’s silly to get greedy with it. Michael operates his sprawling setup as a personal cloud for his family members, as well.

“I have a few hosted websites, and everyone in my family has a preconfigured FTP client to connect to my servers,” he said.

Bargain Hunting For Big Storage

How do you get 390 TB without spending a mint? Michael says it’s all about finding the right deals. The whole thing got started when a former boss asked if Michael would be interested in buying the assets of his shuttered computer repair business. Michael ended up with an inventory of parts which he’s successfully scavenged into the beginning of his 390 TB digital empire.

He’s augmented and improved that over time, evolving his digital library over six distinct storage systems that he’s used to maintaining all of his family’s personal data. He keeps an eye out wherever he can for good deals.

“There are a few IT support and service places I pass by on my daily commute to work,” he said. He stops in periodically to check if they’re blowing out inventory. Ebay and other online auction sites are great places for him to find deals.

“I just bought 100 1 TB drives from a guy on eBay for $4 each,” he said.

Miscellaneous parts

Michael has outgrown and retired a bunch of devices over the years as his storage empire has grown, but he keeps an orderly collection of parts and supplies for when he has to make some repairs.

How To Manage Large Directories: Keep It Simple

“I thoroughly enjoy data archiving and organizing,” Michael said. Perhaps a massive understatement. While he’s looked at Digital Asset Management (DAM) software and other tools to manage his ever-growing library, Michael prefers a more straightforward approach to figuring out what’s where. His focus is on a simplified directory structure.

“I would have to say I spend about 2 hours a week just going through files and sorting things out but it’s fun for me,” Michael said. “There are essentially five top-level directories.”

Documents, installs, disk images, music, and a general storage directory comprise the highest hierarchy. “I don’t put files in folders with other folders,” he explained. “The problem I run into is figuring out where to go for old archives that are spread across multiple machines.”

How To Back Up That Much Data

Even though he has a high-speed fiber optic connection to the Internet, Michael doesn’t want to use it all for backup. So much of his local backup and duplication is done using cloning and Windows’ built-in Xcopy tool, which he manages using home-grown batch files.

Michael also relies on Backblaze Personal Backup for mission-critical data on his family’s personal systems. “I recommend it to everyone I talk to,” he said.

In addition to loads of available local storage for backups, three of his Michael’s personal computers back up to Backblaze. He makes them accessible to family members who want the peace of mind of cloud-based backup. He’s also set up Backblaze for his father in law’s business and his mother’s personal computer.

“I let Backblaze do all the heavy lifting,” he said. “If you ever have a failure, Backblaze will have a copy we can restore.”

Thanks from all of us at Backblaze for spreading the love, Michael!

What’s Next?

The 390 TB is spread across six systems, which has led to some logistical difficulties for Michael, like remembering to power up the right one to get what he needs (he doesn’t typically run everything all the time to help conserve electricity).

Command Central

“Sometimes I have to sit there and think, ‘Where did I store my drone footage,'” Michael said.

To simplify things, Michael is trying to consolidate his setup. And to that end, he recently acquired a decommissioned Storage Pod from Backblaze. He said he plans to populate the 45-bay Pod with as large hard drives as he can afford, which will hopefully make it simpler, easier and more efficient to store all that data.

Well, as soon as he can find a great deal on 8 TB and 10 TB drives, anyway. Keep checking eBay, Michael, and stay in touch! We can’t wait to see what your Storage Pod will look like in action!

Peter Cohen
Peter will never give you up, never let you down, never run around or desert you. He also manages the Backblaze blog.

Follow Peter on:
His web site: peter-cohen.com | Twitter: @flargh | LinkedIn: Peter Cohen | Google+: Peter Cohen
  • Dave S

    Am I the only one who thinks there’s something wrong with a person who feels the need to archive “every single note, test, project, and assignment from 7th grade through graduate school”? Text book scans? Enough music to listen 24 x 7 for *two years*?

    • dakishimesan

      I think the key is that he enjoys it and it only takes two hours a week. Just like hoarding in real life, if it starts to interfere with your relationships and it is compulsive, then I agree it could be an issue. All of us here are probably a little obsessed with organizing our data to some extent. :)

  • I got the impression the amount of data is unmanageable especially if you can’t find your stuff (where’s my drone footage?). I feel your recent stuff should be online and older stuff could near-time on a system that only needs to be on when you need access to old stuff.

  • Jason B

    Are there pictures? details on how many drives or how he set it up?

  • karl

    … and it’s all stored on a RAID 0 volume.

    It is a shame we didn’t get a full gallery of pictures of this various computers that store the data. My entire data (documents, photos, music, projects, mail etc.) comes to 260 GB (excluding ISO and backups of original data), which I thought was large. Remains too large to store online, so I only back-up pictures and documents to the cloud [someone else’s computer].

    I like the advice about keeping it simple. Having no directories in directories, but I would imagine the access time would be slow waiting for all the files to load – that’s the main reason I break my content into large directory structures.

    I noticed Windows mentioned once. I would be interested to know which operating systems Michael is using and file systems. I use BTRFS for the error correcting capabilities and easy creation of snapshots upon a Linux server operating system.

    This article has helped me think about the future. Eventually I will eclipse 1 TB so will need to think about adding more drives to create a larger pool.

    • Colin Stuart

      I’m at 4TB of data right now. With my 4x4TB drives in RAID10 I have another 4TB to go. So far filling it has been slow work, which is good. Only 1TB of it is actually super important and irreplaceable. The last 3TB is all..media, which could be reacquired.

      • karl

        I recently separated my data into two.

        -Data
        -Archive

        I try and keep active data in the ‘data’ directory to speed up sync. The ‘archive’ directory tends to be last year’s data and before. In the future my next server build I will try and implement 4 TB to allow future expansion.

        • Michael Oskierko

          I do the same but using years for secondary directories. My images directory then lists each year I have images for… all the way back to where I got lazy and created a “Prior to 2004” directory. When I’m backing up to Backblaze I will exclude all but the most recent / “high priority” directories so they get backed up first. Then I slowly relax the exclusions.

          • karl

            Yes, with photos I am more organised. I have separated my photos into years. Like yours I have ‘2007 and before’ directory. I prefer to organise my back-ups manually so that I know what’s going on. Plus having to do everything manually helps me learn. I find that I take more responsibility of my data instead of using someone else’s service and hoping the data will be there when we need it. Plus, it helps having a passion for IT and the failures in the past. As for 390 TB, that would be somewhat more difficult to keep copies off-site.

          • Michael Oskierko (Nitricpyro)

            HAHAHA you’re telling me! I’ve straight up, given up, on re-doing any of the configurations for now! I was thinking since storage is so cheap I could drop the single redundant RAID5 and go with a RAID10 but where oh where would I put that data while I destroy and rebuild the array! LOL

          • karl

            Yes that’s a huge undertaking. The reconditioned 45-drive computer will be a great help.

            Here’s a great article on selecting a RAID for the 45-drive machine:

            http://45drives.blogspot.co.uk/2015/11/how-to-decide-on-best-raid_11.html#more

    • Michael Oskierko

      Hi Karl,
      I’m afraid I received the Backblaze Pod between the first and final conversations I had with Peter Cohen. The buckets you see are the striped parts from the boxes I’m moving to the pod so right now it’s not a pretty picture to look at; not that it ever was with 14 computers strewn around the house!
      My “mission critical” data is 4.4TB strong and it’s safely stored on the Backblaze B1 system! (Thank you Backblaze!) I did a lot of research before picking a company so my internet company, who I don’t want to name so they don’t get any ideas, doesn’t throttle, cap, or block ports. It took a little over a month to get it all on their servers and now it’s just a matter of keeping it updated and adding the 10-20GB of new data each month. I had to FORCE myself to just ignore the message you get when the backup is going to take longer than a month to “reconsider” my solution!
      I do run into issues with names being to long and have to go back and shorten some directory names but other than that I really do promote and live by the idea that directories should store only directories or only files. In the long run it helps more with organization simply because I know just from first glance that if I see a directory then the contents of the directory can be broken down into smaller categories. Only when I get into a directory with files alone do I know they are all of the same category / type / etc. When I actually get to a file directory there aren’t many items in it. For example. On my main DOCUMENTS partition (which has a few other drives mounted as folders in it) I can use the shorter list of directory listings to “refine” what kind of document I’m looking for. If we are looking for the installer for Handbrake for example, I can tell you from memory it’s going to be in the …DocumentsSoftware FilesVideo EditingConversion ToolsHandbrake directory. In that directory I have the installers for whatever version of Handbrake I want. 0.9.9 I use so I can specify a specific filesize for my output file or the newest version to get the quickest encode speed. The number of directories doesn’t really slow anything down considering how much quicker I can find what I need. Also though, I’m using RAID 5 for everything and prior to getting this POD each drive has been on it’s own dedicated SATA connector getting it’s maximum throughput.
      In all I’m using Windows 7, Window 10, Windows Server 2008 R2 Enterprise and Datacenter, FreeNas, Rabian Pi, Ubuntu Server, Windows Server 2012 Standard, and now that I have the Backblaze Pod I’ve retired the SNAP servers and their OS’s as well as a Windows XP machine.

  • Colin Stuart

    I used to hoard data. Then I lost it all and I had no backups.

    I still kinda hoard, but not like how I used to. And I make backups now!

    • karl

      I am lucky I lost data early on in my teenage years. Reinstalled Windows 98 because I broke it (always messing around) and lots a handful of pictures and documents. Then onwards, I have kept back-ups and never have, nor hope not to, lose data again. How did you lose your data and how was it stored?

      • Colin Stuart

        I had two WD raptor 36GB drives in raid 0 (72gb usable.. this was the XP days). I know Raid 0 was risky, but I wanted the speed. That setup worked fine for a few years. One day I was doing some work on the PC and had the fan infront of the drives unplugged. While using the computer, the whole thing locked as those two drives hot searing hot. I reboot, plugged fan back in, and a week later one of the drives died…. it was game over. Everything lost. I tried the freezer trick, no dice. I had an ancient CD i backed up with a few things from years before but that was it. After that I used the single drive by itself for awhile… until that one finally bit the bullet too.

        Also, probably a good 8-9 years before that I had a bunch of stuff saved and had a huge infection on my PC. That was long before my days of attempting to recover data or clean up viruses so I just reformatted and lost stuff then too. That was only a few GB of stuff though that didn’t really matter.

        Nowadays I keep it all on a centralized NAS, RAID10, keep separate drive for a backup of the important stuff, and then have Backblaze B2 to backup the important stuff too. $<5 … such a great deal.

        • karl

          Back then data storage was expensive so we can be forgiven not have regular back-ups. I used to write data to floppy and CDs, but this was complex and required time and regular attention. The central model of data storage is essential these days. Also, being one dedicated PC, we are less likely to ‘tinker’ with it. Hell, I am nervous about updating it. I only need two hard drives 1TB in a RAID 1 with an error correcting file system. In addition to that, I have another 1TB hard drive for off-site backup and another close by in the cupboard, and I use S3 for online backup. I want to use B2 as well but have had problems with the script working under Linux.

          • Michael Oskierko

            That’s a great idea… I started with a second computer with the same configuration as the first and put it on the exact opposite side of the house. Worst case scenario I figured I could grab at least one of them in a fire or something and in less sever situations I could just swap out machines if needed until I could get the second replaced.

          • karl

            I like the opposite side of the house idea. I watched a YouTube video about a NAS setting on fire.

        • Michael Oskierko

          My first data loss was way back when on a massive 80GB drive, at least massive for the time. That was at least 10 years ago and only recently I did a search on Google and bough a new PCB board for the drive. With heat you most likely fried something on the board. In RAID 0 you will of course need both drives without having altered any of them but replacing the PCB might get you the data back. It worked for me! I now keep that entire 80GB’s of data in a single archive for “sentimental” reasons! LOL

  • Pingback: Confessions Of A Digital Pack Rat: Almost Half A Petabyte And Still Growing – Akshaya IT Services()