Data center outage

By | February 5th, 2014

*UPDATE — 2/12/2014 — The post mortem on the data center outage is complete. The outage started when the Emergency Power Off (EPO) system was activated. The outage began on 02/05/2014 at 9:08 am PST and lasted 1 hour and 43 minutes until 10:51 am PST. At this point the Backblaze servers started recovering and were operational shortly thereafter. There was no loss of any data backed up to the Backblaze Storage Pods prior to the outage.

The outage began when the EPO system was triggered by a false report of a fire in an area of the data center. The EPO system is designed to activate when it detects a fire so that emergency workers and data center staff are not at risk of electrocution while responding to the emergency. There was no fire. The source of the failure occurred as workers were removing old wiring from under the data center floor and they inadvertently exposed and crossed two or more live wires from the current active system. This occurred in such a way as to indicate a fire to the EPO system and the system was automatically activated.

To minimize this from occurring in the future, the work of removing the old system wiring has been abandoned. In addition, a hardware upgrade is planned to include a 2nd trigger requirement to activate the EPO system.

*UPDATE – 2/5/2014 Noon (California time) – all account servers are now up, and backups can continue. This means you can now Install trials, Purchase the service, Update to the new version, Access the account, Browse your files, and Prepare Restores. A few servers are still being checked, so we’ll update this blog as any new information comes out. — Brian Wilson, CTO

original blog post below here

Ouch. It’s hard to say anything else than that.

On Monday we announced our new data center in Sacramento and today we launched version 2.5 of our service, with a number of highly-requested features. We sent an email announcing the new product release to half of our customers (with the other half scheduled to receive the email tomorrow) and offered them to upgrade today to get the new features.

Unfortunately, some combination of events brought power down in our new data center at 9am this morning. The timing couldn’t be worse.

We have people onsite at the data center and have been continuously in communication with the data center to ensure all steps were taken to bring power back on. At this point, 11:05 am, power has been turned back on.

Status as of 11:05am:
* Power has been restored to the data center
* Most services (new trials, purchases, backups, restores) are unavailable for most users
* Already backed up data is still backed up

Now starts an involved process where we will bring up all equipment, check that no systems have been damaged, and bring all services back online. This will take a while to be 100% complete – likely toward the end of the day.

However, within an hour we expect that users will be able to do almost everything:
* Install trials
* Purchase the service
* Update to the new version
* Access the account
* Restore files
(Backups may start as well, but will be rolling out slowly across all customers.)

We will continue to update this post with new information. Please bear with us.

Gleb Budman
Co-founder and CEO of Backblaze. Founded three prior companies. He has been a speaker at GigaOm Structure, Ignite: Lean Startup, FailCon, CloudCon; profiled by Inc. and Forbes; a mentor for Teens in Tech; and holds 5 patents on security.

Follow Gleb on: Twitter / LinkedIn / Google+
Category: Backblaze Bits