Technology Sectors
Disaster Preparedness 2011: Ensuring Cloud service continuity during disasters
|
|
|
John Szczygiel |
It’s a distressing feeling when you’re in the middle of an unfolding disaster, and suddenly it hits you -- your plans and resources are being overwhelmed by the event. At this point or soon thereafter, you’ll likely ask yourself a series of questions: How and when will we re-establish services for customers and employees? What will this disaster cost us in terms of revenue, reputation and recovery expenses? How can we avoid being in this position again?
As a SaaS provider of security management systems, our business is based on the availability of our services around-the-clock. Failure of a disaster backup or recovery plan has a direct, immediate impact on our ability to deliver the services our customers expect. We operate on the Internet, so we’re used to managing persistent threats and continuously evaluating and upgrading our contingency plans. However, there is nothing like a real emergency to put those theoretical plans to the test.
Earthquakes and hurricanes
Last August, many of us on the East Coast were presented with the unusual opportunity to test our plans against two very different types of emergencies: the kind you can only anticipate and the ones you know are coming. The earthquake of August 23 was an example of a disaster you need to anticipate. Other such examples are technological failures, terrorist incidents, hacking events, fires and loss of key personnel. These events occur without warning and test your processes, as they exist. There is no time to beef up your response capabilities or recover from incomplete or outdated plans.
Different from an earthquake, Hurricane Irene falls into the category of other events for which we have some advance warning, such as blizzards, tornadoes, work stoppages, planned technical work or large public gatherings that interrupt normal routines and traffic patterns.
Both types of disasters can have significant, even disastrous, impacts on providers who have the responsibility to deliver their services around-the-clock. Below are a few of the things that we find to be important for managing each type of event.
Unannounced events: Since these events don’t have the manners to announce themselves, they are a demanding test of your plans. The single most important factor for recovery is the quality of your advance planning. In your contingency plan, you should consider the following;
- Who is responsible for what aspects of ensuring continuity and recovery?
- How will the team communicate internally and with customers?
- What happens if normal communication paths are disrupted?
- Who takes over if we lose access to key personnel?
- What happens if you take away key assets expected to be available for recovery, such as people, facilities, communication methods, etc.?
- Forecasted events These disasters should be simpler to manage since you will have access to a reasonable amount of information on the timing and severity of the event. In addition to the key elements in place for unannounced events, you now can focus on additional elements, such as:
- Ensuring availability of key employees by staggering shifts and dispersing locations;
- Plan for accommodations, food and communications in case of extended events;
- Check with partners on the status of their preparations;
- Communicate what is expected of your employees and customers to assist with the preparation and what communications to expect during and after the event.
Getting ready for our disaster close-up
Both events of August 2011 necessitated activation of our emergency response plans. In the case of the earthquake, we were forced to evacuate our building and re-establish essential services from alternate locations. Our plan had anticipated this occurrence, so we had the infrastructure and procedures in place to make this happen. The biggest challenge in this situation was communications. Phones, cell phones and some Internet providers were down, so having access to multiple redundant communications paths was essential.
Our data centers remained operational after the earthquake, but our team had made preparations for a full switch-over to our disaster recovery center in case aftershocks affected the primary sites. Our data center personnel assisted the decision-making by immediate proactive communications on their operational status, directly following the earthquake and until normal operations resumed.
As Hurricane Irene barreled up the East Coast, we knew our disaster preparation and recovery plans would be tested again. This time, however, we were afforded the opportunity to coordinate back-up plans in advance of the event. Our operations team staged technical support and data center resources and prepared to switch operations to our data center across the country, if necessary. Fortunately, Irene passed over the Washington, DC, area with just some strong winds, and our operations were not impacted at all.
Both of these events provided valuable live tests of our emergency preparedness and disaster recovery plans. Both events provided an opportunity to assess the plans we had in place and uncover areas that needed improvement. Continuous upgrading and evaluation of these plans are essential in helping you to avoid that queasy feeling that an unfolding disaster just got the better of you.
John Szczygiel is the executive vice president at Brivo Systems. He can be reached at:
