Business continuity and disaster recovery: making a start
By Dr. Jim Kennedy, MRP, MBCI, CBRM, CRISC, CHS-IV
Over the last six to seven years the world has witnessed death, devastation, and destruction on a scale of ever increasing occurrence and ferocity. In 2004 the great Sumatran Tsunami, in 2005 Hurricane Katrina, in 2010 the eruptions of the Eyjafjallajökull volcano in Iceland, and most recently the 2011 Tsunami in Japan are examples of recent events that have severely and adversely impacted both public and private sector operations. As we look into the future these types of events seem to be becoming more frequent and will have more extensive impacts.
All of these events coupled with the wild fires in California, the floods in the US Midwest and UK, and power blackouts across the US and Europe all have had adverse effects on small, medium, and large businesses world-wide. However despite all of this, surveys taken in 2010 and 2011 continue to show that many businesses across the world still do not have business continuity and/or disaster recovery plans in place to mitigate the impact of adverse events that the business operations might suffer.
As the practice lead for business continuity and information security for several major consulting organizations I have seen first-hand the lack of contingency planning preparedness and I also hope to provide some insight into how to properly protect a business operation to provide the necessary resilience to survive the aftermath of a significant business impacting event.
I have provided business continuity and disaster recovery consulting services to many industry groups such as telecom, pharmaceutical, manufacturing, petrochemical, energy, and utility power business operations and all seem to face the same sets of challenges:
1) Lack of understanding or commitment by senior management to adequately staff and fund a business continuity and disaster recovery program within their organization.
2) Lack of understanding by functional management as to how important business continuity and disaster recovery planning is in the ability for them to continue operations to perform their mission.
3) Lack of information technology management understanding of the inter-relationship of business continuity planning with their role of providing technology to continuously support their organization’s mission.
This paper will focus on items 2 and 3 above and I will leave item one and a discussion of business continuity management in general for a future article.
Before I begin to discuss the necessary ingredients of contingency planning and how each component is developed and compliments the other I believe that definitions are in order.
Business continuity planning - the activity performed by an organization to ensure that all critical business functions will be available to customers, suppliers, regulators, and other entities that must have access to or rely upon those functions.
Disaster recovery planning - is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced or technological disaster has occurred. Disaster recovery is a subset of business continuity.
Business unit recovery planning - the component of business continuity which deals specifically with the relocation of key organization personnel in the event of an adverse event, and the provision of essential records, equipment supplies, work space, communication facilities, computer processing capability, etc.
Business impact assessment - an impact analysis results in the differentiation between critical (urgent) and non-critical (non-urgent) organization functions/ activities. A function may be considered critical if the implications for stakeholders of damage to the organization resulting are regarded as unacceptable. Perceptions of the acceptability of disruption may be modified by the cost of establishing and maintaining appropriate business or technical recovery solutions. A function may also be considered critical if dictated by law or regulations.
For each critical (in scope) function, two values are then assigned:
The recovery point objective must ensure that the maximum tolerable data loss for each activity is not exceeded. The recovery time objective must ensure that the maximum tolerable period of disruption (MTPD) for each activity is not exceeded.
Next, the impact analysis results in the recovery requirements for each critical function. Recovery requirements consist of the following information:
High level mitigation strategies – the function of developing strategies that satisfy the business recovery requirements as identified in the business impact phase.
Phase one – business impact assessment (BIA)
Regardless of which business continuity standard (e.g., BS 25999, NFPA 1600, or ISO 27031) you decide to use as a guide, all see the value of an impact assessment as fundamental to a successful business continuity or disaster recovery plan.
As we all know there are always limited resources (people, time, money) within any organization. So for companies to adequately protect their assets and provide adequate business continuation and protection in times of adverse events mission critical functions need to be identified and recovery resources need to be focused on them. In order to accomplish this a business impact assessment is undertaken to:
The outcome of this phase is the identification of the most critical functions / processes / activities; IT infrastructure necessary to accomplish those functions / processes / activities; potential risks associated with those functions / processes / activities; and the development of an understanding of the timing needed to maintain a certain level of operations to enable business to accomplish its mission(s).
This is an area where the business continuity and disaster recovery plans can potentially be sent off in the wrong direction. It is also an area that, I have found, seems to be foreign to many business operation leaders. Despite coming from the most prestigious MBA schools and even being the best and brightest in marketing, sales, finance, supply chain, etc., these leaders in many cases do not have a clue as to what is important to their operation’s mission and for the most part are often times unable to define or agree upon the needed RTO, RPO, and MTDP levels – the most important output of the BIA. Further they do not seem to understand the importance of business continuity and disaster recovery planning to their operation. I have received questions when conducting a BIA of a business unit such as: “Why do we need to do this we have never had a failure in the past five years?”
So to overcome this lack of understanding you will probably need to provide some level of internal awareness training for these leaders so that they are more aware of why business continuity and disaster recovery planning is taking place and explain the various phases including what is being looked for in the BIA. I have found that conducting business continuity and disaster recovery planning workshops with the organization’s leaders and decision makers as a precursor to the actual planning activities goes a long way in accomplishing successful business continuity and disaster recovery planning activities. The training should include identifications of risks, costs of mitigation strategies and how to make RTO and RPO decisions, in addition to other things they, as business leaders, need to be thinking about for the recovery of their respective organizations and operations.
Generally, I have found that risks come down to the same three or four general categorical risks unless the business is unique. Those four most common place risks are:
1) Loss of a part of a business function or process either brought about by a technical or operational failure.
2) Loss of an entire function and/or process brought about by a failure of a complete computing platform or operational failure or inaccessibility involving a portion of the building where the function and/or process is performed.
3) Complete loss or inaccessibility of the data center and/or general business operations due to a technological, human related, or natural disaster.
4) Complete loss or inaccessibility of multiple data centers and/or general business operations due to an event that is geographically diverse (e.g., 2003 power blackout, 2011 tsunami in Japan both effected large geographical areas and required recovery outside of the impacted areas.
Once the BIA is concluded the essential elements of what a business operation needs to recover from an impacting event will have been identified (both internal and external as would be the case with any cloud computing services used by the business), interrelationships with other functions and operations, any laws and regulations governing recovery times, and what recovery requirements are needed. The timeframes for recovery will also have been identified. Generally the BIA is signed off by the senior leader of a specific business operation, however, if the BIA covers across business organization functions (e.g., HR, Finance, sales, supply chain, logistics, etc.) then more senior management will need to sign off on the critical functions, timing, and most importantly mitigation costs.
Phase two – mitigation strategy development
During this phase the recovery strategies are developed that satisfy the business recovery requirements (RTO, RPO, and MTPD) identified in the BIA phase. This is also where business leadership will make the decision as to the amount of resources (people, money, and time) that will be spent addressing the recovery requirements. The rule of proportionality is in effect here. That is the amount of resources committed to implementing the mitigation strategy should be less than or equal to the actual impact costs if an outage to the business operation were to occur. Frequency of potential outage and the financial and operational impact to the business if an outage were to occur are all taken into consideration. All options have different recovery times, costs, and capabilities associated with them. Business leadership will be presented with the various recovery options and costs. They will ultimately select the appropriate recovery strategies to best meet their business’ requirements.
Phase three - planning
Once the recovery strategies have been identified and selected the process moves on to the planning phase. There are usually the following types of plans required for most organizations:
I will leave the overall entity-wide business continuity plan for last.
Disaster recovery plan
The disaster recovery plan is most often left to the IT department to develop and my experience has found that for the most part IT has accepted the challenge and provided very thorough and comprehensive technology recovery plans for their operations (e.g., data center, phone switch, wide area and local area networking, computing environment, and etc.). They are the most knowledgeable of the technology and are the best prepared to develop and test technology focused disaster recovery plans.
They are sometimes not so good at developing their own business unit plans which cover where they will operate from, succession planning, what human resources and training are required for recovery efforts, and what other resources (phones, desks, forms, personnel accommodations and etc.) will be required to maintain any service level agreements. So I have found it important to the success of disaster recovery plans that IT business unit plans are also properly developed, made operational and periodically exercised.
Remember the disaster recovery plan covers the restoration of technical functions that the business needs to properly function – no more and no less.
Business unit plan
The business unit plan focuses on what is required to restore the mission critical functions / processes / activities for a particular business operation. As such the people developing this plan need to be intimately knowledgeable in the inner workings of that business function. Many businesses incorrectly task IT with this recovery planning effort as well and history has proved that a plan developed in that manner most often fails. This plan needs to be tasked, developed, reviewed, exercised, and signed off by the people in the functional area to be recovered.
The business unit leaders can either have personnel from their respective areas educated in what it takes to develop a business unit recovery plan or they can obtain the services of expert consultants to guide the effort. In either case the information needed for the plan needs to come from the subject matter experts (SMEs) within their own organization and final oversight and signoff needs to come from the senior management of that unit or function, and the project champion for the overall business continuity effort.
The business unit plan will cover any workarounds or manual processes that can be used in the early stages of an impacting event. It also covers the minimal manpower requirements needed for recovery, office equipment needed, forms and documents required, and the location from where recovery will take place.
It also includes contact lists of employees needed for recovery, vendors who can supply needed equipment and supplies in the time of need, and customers (internal or external) to let them know that an incident has occurred and how they will be provided goods and/or services during the recovery period.
Communications to senior leaders is also a product of this plan so that they can be kept abreast of recovery operations, what they might be needed to do, and so they can keep stakeholders and possibly the public apprised of recovery operations.
It is also important to insure that any IT disaster recovery times dovetail with the business unit plans needs for technology to be available. Systems and networks must come back on-line when the business units need them as part of their recovery plans. Many times IT will develop their recovery plans in a vacuum (without consulting with the business units) and find that in their plan they would bring up a particular computing platform in 24 hours when the business unit needed the capabilities of that computing platform in less than 8 hours after an incident occurred. IT built and tested their plan and it was successful (met recovery time and point objectives), the business unit tested their plan and it was successful, but the overall recovery of operations failed because business had planned all workarounds and manual processes based on an IT recovery and technology availability of 8 hours. IT recovered 16 hours later than required for the business to resume operations.
Entity-wide business continuity plan
As indicated earlier, corporations face a variety of risks and potential crises that result in more damage more quickly than ever before. Larger area power failures, larger and fiercer storms, and terrorist threats across the globe all have the capabilities to render a successful albeit unprepared business non-operational in a very short period of time. The unprepared business is placed at a greater risk in the event of an adverse event.
The enterprise-wide business continuity planning program is used to assess and manage the effects of a significant emergency disruption on the business’ operations in an effort to provide continuity of critical business functions. Such critical business functions include entering of client orders, completing regulated transactions and providing employees and customers access to mission critical processes and functions.
As indicated previously the business continuity planning program begins with each business unit's assessment of its business continuity risk. This process encompasses all aspects of key business functions. The assessment defines, for each business process, its criticality and a method for recovery. Individual business unit plans are then reviewed and updated annually, or as significant business changes occur.
The enterprise-wide plan is designed to account for the actions the business will take in the event of disruptions of varying scope and uniqueness. This includes incidents involving a single office building where any of the business’ offices may reside, city-wide or regional events of disruption. It also includes information regarding people loss, where staff members may be unable to work at their normal business location. It also establishes the governance model that will be used to keep all contingency plans up-to-date and actionable.
The enterprise-wide business continuity plan needs to be designed to allow the business to continue its operations, likely at a reduced capacity, and safeguard the interests of its stakeholders, customers, and employees.
The idea of contingency planning is not unique. Is should not come as a surprise that business operations need to have current and active contingency plans in place. IT should not be tasked with all of a business’ contingency planning efforts. It is up to the business functional line managers to make sure that plans are implemented and actionable. It is up to senior management, as part of its fiduciary responsibility, to make sure that adequate resources are made available to adequately provide for business resilience and availability.
Remember it is not *if* an adverse event will impact a business, the real question is *when* will the event occur?
Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV, CRISC has a PhD in Technology and Operations Management and is the chief consulting officer for Recovery-Solutions. Dr. Kennedy has over 30 years' experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of two books, ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. Dr. Kennedy can be reached at Recovery-Solutions@xcellnt.com
•Date: 17th August 2011 • Region: US/World •Type: Article • Topic: BC Plan Devevlopment