A methodology for developing a business continuity strategy
- Published: Wednesday, 04 May 2016 09:29
By Alberto G. Alexander, Ph.D, MBCI
Once an organization has developed its business impact analysis (BIA) and its risk assessment, it has, according to ISO 22301:2012, to determine an appropriate business continuity strategy (BCS) to be able to resume and recover prioritized activities, at a specified minimum acceptable level. This has to be done taking into consideration the time within which the impacts of not resuming the activities would become unacceptable. The development of a BCS is probably one of the most complicated steps in building a business continuity management system (BCMS). An appropriate BCS demands the usage of a methodological approach and creative thinking. In this article the author presents a methodology for developing an effective BCS and the managerial aspects which need to be considered to stimulate a creative thinking environment.
The objective of this stage of the BCMS process is to develop a business continuity strategy that satisfies the business recovery requirements identified in the BIA stage. The BCS is composed of a set of recovery options to be utilized as alternatives in the event that existing critical resources become unavailable. The business recovery requirements can generally be grouped into four recovery areas (Hiles, 2011):
- Work areas
- IT systems and infrastructure
- Manufacturing and production
- Data and critical/vital records.
Some illustrations of recovery requirements for these areas could be the following:
Arrange an alternate work area for the crisis management team
Arrange an alternate office work area for staff
IT systems and infrastructure:
Arrange an alternate facility for recovering IT systems
Recover damaged systems
Manufacturing and production:
Recover damaged manufacturing equipment
Data and critical/vital records:
Restore damaged critical records
Restore lost data.
This article will describe a framework for developing the BCS. The approach begins by identifying business recovery requirements and ends with a set of recovery options for the BCS. Within the framework, several recovery options are considered as possible solutions to address the recovery requirements. For example, in the area of IT systems and infrastructure, potential recovery options include a hot site, cold site, or warm site. These options generally have different recovery times, costs, and capabilities associated with them.
Only those options that meet the recovery time requirements are selected for further assessment. The framework compares costs and capabilities of the selected options and determines the most appropriate and viable alternative. The final assessment consists of deciding which is the most appropriate recovery strategy. A formal, structured approach should be used for evaluating the pros and cons of the various potential strategies. A strategy selection scorecard is presented which can be used to help ensure a balanced evaluation.
An approach for business continuity development
The BCS development framework consists of four phases:
- Phase A: Recovery requirements identification
- Phase B: Recovery options identification
- Phase C: Availability time assessment
- Phase D: Cost-capability assessment.
Phase A of the framework determines the recovery requirements to be addressed by the BCS. Phase B identifies possible options as solutions to the recovery requirements. Phase C eliminates those options that do not meet the recovery time requirements. With the remaining options, Phase D assesses their cost and capability trade-offs to select the most viable and effective option.
Figure one, below, illustrates the four phases of the BCS development framework.
Figure one: continuity strategy development framework
Phase A: recovery requirements identification
This phase identifies the recovery requirements to be addressed by the BCS. Phase A consists of five steps as shown in figure one. Step 1 produces a list of recovery requirements to be addressed by the BCS. These requirements, which are primarily derived from the BIA, identify:
- Critical business processes and resources that should be the focus of the recovery strategy, and
- Time requirements: maximum tolerable period of disruption (MTPD), work recovery time (WRT), recovery time objective (RTO) and recovery point objective (RPO), for recovering these processes and resources.
Step 1 can also produce additional recovery requirements not included in the BIA. Examples of such additional requirements are the resources needed to support the crisis management center / centre (a facility from which the crisis management team directs recovery efforts). It can be located in an office work area or a hotel conference room.
Steps 2 to 5 of this phase group the recovery requirements, identified in step 1, into different recovery areas. The most typical four common recovery areas are:
- Work areas
- IT systems an infrastructure
- Manufacturing and production
- Data and critical/vital records.
The recovery requirements for each category area are further divided into different categories. Steps 2 to 5 produce detailed requirements for each category corresponding to their recovery area. The list below shows the recovery areas and requirement categories for steps 2 to 5.
Recovery area: work areas
Recovery requirement categories:
- Alternate office work areas for staff to perform work.
- Crisis management center, for crisis management team to conduct recovery efforts.
Recovery area: IT systems and infrastructure
Recovery requirement categories:
- Critical IT systems and infrastructure
- Alternate IT recovery facilities
Recovery area: Manufacturing and production
Recovery requirement categories:
- Critical equipment and resources
- Critical products
- Alternate manufacturing and production facilities
Recovery area: data and critical/vital records.
Recovery requirement categories:
- Critical data and off site data storage facilities
- Critical records and off site record storage facilities.
Phase B: recovery options identification
The objective of phase B is to identify available recovery options for the recovery requirements produced in phase A. As depicted in figure one this phase is divided into several steps, each assigned to a specific recovery area. These steps identify recovery options available for the requirements related to their recovery areas. For example, step 1 identifies three options available for recovering the IT systems and infrastructure recovery area:
- Pre – established: this is where systems are acquired and installed prior to a disruptive event and are used only for recovery purposes.
- Pre – arranged (quick ship): this is where an agreement is made with a vendor that guarantees the delivery of the required systems within an agreed time following a disruptive event.
- Acquire: as needed this is where the required systems are ordered from a supplier following a disruptive event.
Phase C: availability time assessment
The purpose of this phase is to determine the availability of the recovery options of phase B through an assessment of expected availability time (EAT) of resources specified in the options. For each option, this assessment involves three main steps:
- Step 1: Evaluate the EAT of the resources
- Step 2: Compare the EAT with the recovery time requirements-maximum tolerable period of disruption (MTPD), recovery time objectives (RTO), and work recovery time objective (WRT).
- Step 3: Select the recovery option as viable if its EAT satisfies the recovery time requirements.
As an illustration of this assessment, assume a pre-arranged (quick-ship) acquisition of IT systems is selected as a recovery option by phase A. Also let us assume that the MTPD, RTO and WRT are less than ten days. The first step evaluates the EAT for IT systems and determines its value as seven days which allows four days of delivery time of the IT systems (based on the pre-arranged acquisition option) and three days of system recovery time. The second step compares the EAT of seven days with the IT system recovery requirements of ten days. Step three selects this option as viable because the EAT of seven days satisfies the ten day recovery requirements.
The options that are not selected by step three are eliminated while the remaining (selected) options are used as input for the next phase.
Step one of the assessment requires a detailed evaluation of various concerns that can adversely impact the EAT of resources.
Phase D: cost capability assessment
The recovery options that satisfy the recovery time requirements in phase C are further analyzed and compared in phase D. The purpose of this analysis and comparison is to select the options which best satisfy the recovery cost and capability requirements. The selected options become part of the business continuity strategy.
As a simple example of this phase, assume that phase B identified the following two systems acquisition options for the recovery of critical IT systems and infrastructure:
- Pre – established
- Pre- arranged (quick- ship).
These two options require a further analysis and comparison of their costs and capabilities. Compared to the pre-arranged (quick-ship) option, the cost of pre – established acquisition of systems is generally higher. The recovery effort with a pre-arranged (quick-ship) option, however, is more difficult because the option often requires additional installation and configuration steps compared to the pre-established options. In the pre-established option, such installation and configuration steps occur prior to a disruptive event. Based on this simple comparison, phase D may select the pre–established system acquisition option if the cost is less of a concern compared with the recovery effort.
There are three main steps in phase D. Step 1 defines a list of capability attributes to measure capabilities of recovery options. The capability attributes, which represent specific recovery requirements and preferences of an organization, can include the following:
- Effort - measures how much effort is needed to implement an option
- Quality - measures the quality of products, data and service associated with an option
- Safety - measures how well an option satisfies safety requirements
- Control - measures how much control organizations have over the use and implementation of an option.
- Security - measures physical and information security aspects of an option.
Step 2 of phase D evaluates each option in order to determine its cost and assign values to its capability attributes. Qualitative metrics such as low, medium, or high can be assigned to both cost and capability attributes. A low value indicates that an option has high cost or less capability, while a high value represents a low cost or high capability.
Figure two shows an example assignment of values (ratings) to cost and capability attributes of the following recovery options for the IT systems and infrastructure recovery area:
IT system and infrastructure acquisition options
- Pre – established
- Pre – arranged (quick – ship).
Alternate IT recovery facility options
- Company owned cold site
- Commercial hot site.
The final step in phase D, selects the most appropriate option by comparing their relative costs and capabilities. The discussion below explains the selection process for the example options in figure two.
Compared with the prearranged (quick – ship) option, the pre-established systems acquisition option requires less recovery effort; results in a higher quality of system recovery; and provides more control over the recovery process. The cost of this type of option, however, is generally much higher than the pre-arranged (quick – ship) option.
Compared to the company owned cold site, the commercial hot site option requires less recovery effort; results in a higher quality of system recovery; offers better safety compliance; and provides better security controls. This is because the hot site is always equipped and ready for use, whereas, a cold site needs additional effort to setup, configure, and test systems and equipment. The safety and security controls are typically provided as part of the services by the hot site vendor. The cost of the hot site option, however, is significantly higher than the cold site option.
Figure two: cost-capability ratings
If the cost is not a primary concern, an organization may select the following options to be included as part of the continuity strategy based on the above comparison:
- IT system and infrastructure acquisition option-pre-established
- Alternative it recovery facility option-commercial hot site
General recovery strategy considerations
The key to a successful business continuity strategy is to select recovery options based on an evaluation which considers their characteristics and capabilities. For instance, a hot site option requires a careful consideration of: the distance between the recovery site and the primary site, to ensure it is less likely to be affected by the same disaster; the extent of technical support available during recovery; and the response time to have the hot site available once the disaster is declared.
Illustration of general considerations for evaluating recovery options
Alternate IT recovery facility considerations:
- Consider a location that would not be affected by the same disaster.
- Consider the time it would take for the recovery team to arrive at the location.
- For the most time critical systems, consider the use of commercial hot sites with dedicated mirrored systems capabilities.
- To reduce the cost, initially use a hot site to recover critical systems then move to a cold site until the original or a new site becomes available.
- Avoid reciprocal agreements if the system and equipment compatibility is difficult to achieve or maintain.
- Ensure that the IT recovery facility has adequate power redundancy, fire protection controls, and physical security access controls.
- Ensure that the IT recovery facility vendor provides adequate technical support.
- Select an IT recovery facility vendor with a long history of supporting IT recovery facilities.
Alternate work area considerations:
- Consider a work area, which is at a location not expected to be affected by the same disaster.
- Consider a contract with a work area vendor for fixed or mobile office work areas.
- Consider ‘work from home’ as an option.
- Consider the use of existing remote company locations and spaces.
- Ensure that the alternate work area is equipped with adequate data and voice communications lines and equipment, and amenities such as bathrooms and shower facilities.
- Ensure both remote and local personnel can access the alternate office work area.
- Consider the use of hotel conference rooms for the crisis management center.
- Ensure the alternate work area is equipped with office resources such as fax, copiers, white boards, and stationary.
Off –site storage facility considerations:
- Consider a storage location which is at a safe distance away from the primary site, and where it is unlikely to be affected by the same disaster.
- Ensure that the hours of operation of the storage vendor meet the storage and retrieval requirements.
- Ensure that the storage facility can adequately protect the storage media from moisture and temperature damages.
- Ensure that the storage facility has adequate security and safety controls.
- Ensure that the storage facility vendor has proper storage media handling procedures.
- Select a storage vendor with a long history of supporting storage facilities.
IT systems and infrastructure acquisitions considerations:
- For systems with recovery time objectives of less than eight hours, use a pre-established system acquisition strategy where alternate systems are acquired prior to the disaster event.
- For systems with a recovery time objective of less than 72 hours but greater than eight hours, use a pre-arranged (quick-ship) strategy where alternate systems are delivered after a disaster within an agreed time period.
- For systems with a recovery time objective greater than 72 hours, use an acquire-as-needed acquisition strategy where the alternate systems are acquired after the disaster event.
- Use identical replacement systems where possible.
- Ensure replacement systems are fully tested for recovery prior to a disaster.
- Frequently test each system separately and together with other systems.
- Ensure voice and data network systems and equipment have sufficient capacity for recovery.
Manufacturing and production recovery considerations:
- For critical equipment with a short recovery time objective, use a pre-established system acquisition strategy where the replacement equipment is acquired prior to the disaster event.
- Where possible, replace older model equipment with newer models, which are easier to repair and replace in the event of a disaster.
- Harden the alternate manufacturing and production facility with measures such as redundant heating, ventilation, and air conditioning (HVAC) equipment or back up power generators.
- Ensure that the alternate manufacturing and production facility complies with safety and environment regulations.
- Consider maintaining a backup of critical parts at an off-site facility.
- Consider stocking a surplus of raw material/product inventory needed during recovery time at a remote warehouse.
- Establish reciprocal arrangements with other similar organizations for recovery assistance.
- Consider establishing an agreement with a vendor to salvage and restore any damaged equipment and resources in the event of a disaster.
Recovery contracts and service level agreements
Recovery contracts and service level agreements (SLAs) are a means for ensuring proper implementation of the selected recovery options. Written contracts and agreements must be comprehensive and should account for any conditions that can hinder or prevent successful recovery. As a simple example, consider a hot site as an option for an alternate IT recovery facility for highly time-critical systems.
Even though this option satisfies the recovery requirements, the recovery may be hindered if the recovery contract allows the vendor to use untested compatible equipment instead of identical replacement equipment. An effective contract will either restrict the recovery to identical replacement equipment or allow an exception for the use of compatible equipment only if it has been pretested and validated for recovery, prior to a disaster.
Commercial contracts and agreements for work areas, IT systems and infrastructure, and manufacturing and production recovery areas, should clearly ensure the following:
- Availability of the facility, with the required setup and configurations, within the acceptable time period for recovery.
- Adequate frequency and time for testing the facility for recovery.
- Adequate work areas for staff.
- Sufficient recovery time.
- Access to technical support.
- Adequate capacity for voice and telecommunication links.
- Availability of identical replacement equipment.
- Guaranteed access to a recovery facility through alternate arrangements in the event that the facility is being used by any other organization.
- Clearly stated roles and responsibilities for both vendors’ support personnel and the organization’s recovery team.
- Secure and easy access to the facility.
- Process for controlling changes to systems, equipment, facilities and resources.
- Procedures to renew the contract.
While many of the above concerns also apply to reciprocal agreements, there are certain aspects of reciprocal agreements that are unique. Organizations should ensure that reciprocal agreements could accommodate the following:
- Sufficient additional capacity to recover the partner’s systems, equipment, work areas, communication links, etc.
- Security controls to protect sensitive data and information while allowing the other organization to conduct disaster recovery efforts.
- The time required to setup the facility as an alternate recovery facility once the disaster is declared by the other organization.
- Safety conditions stated in the recovery requirements.
- Environmental safety requirements (such as lead free, dust free, chemical free, etc.)
- Sufficient testing time required to validate the recovery process-
- Alternate arrangements in the event that the existing facility or equipment is unavailable.
- Procedures to notify the other party of changes to the facility, systems, and equipment.
The terms and conditions in any of the recovery related contracts and agreements must be carefully reviewed by the organizations´ legal departments. It is also important to investigate the short term and long-term financial and operational status of potential recovery vendors and the organization to support the recovery requirements.
Selecting the right strategies
When selecting the right BCS there are some aspects that need to be considered, such as:
- Selecting the right continuity strategies usually involves several trade-offs that management has to consider.
- Typically, the most effective strategies are also the most expensive.
- Conversely, the least expensive strategies are often impractical, risky, or fail to meet operational requirements.
- The challenge is to identify those strategies, or mix of strategies, that are affordable but will provide an appropriate level of risk management.
- A formal, structured approach should be used for evaluating the pros and cons of the various potential strategies.
- The evaluation should be performed by a cross-functional disciplinary group, not by one or two specialists who may have preconceived biases.
- Failing to do so can result in selection of shortsighted strategies that provide a completely false sense of security.
- The use of a ‘strategy selection scorecard’ can be an effective technique for ensuring a balanced evaluation
Strategy selection scorecard
Once recovery options have been chosen, they need to be evaluated according to certain criteria and presented to top management for deciding which will be the strategy to choose. There are many issues involved for deciding which strategy is the most appropriate.
In figure three a strategy selection scorecard is presented to be used to evaluate the strategies that have been selected as the most feasible ones. On the left side of the scorecard are the evaluation criteria that will be used to evaluate the strategies. These are just an illustration of the type of criteria that could be utilized. Each organization will actually decide the criteria’s that are more appropriate to its industry and organizational culture. Each criterion is assigned a weighting factor, an assessment of how well the strategy satisfies the criteria is estimated (scale from 0 to 5). Then a target score is calculated. This score reflects how well the strategy accomplishes the criteria.
This is estimated considering the weight score x the highest satisfaction rating. Next step is to estimate a general satisfaction percentage score, which is calculated dividing the score into target, x 100%. Finally, an overall satisfaction is calculated for the strategy that was evaluated. Each strategy will have an overall satisfaction percentage, which allows making comparisons between different strategies.
“It is very important to take into account when using any kind of scoring technique, that it is not really the final score that matters, but the process that the organization goes through to come up with the score” (Hiles, 2011). Using a scorecard allows the participants to focus on the pros and cons of each strategy in a structured fashion, without letting their personal biases distort the selection process.
Figure three: strategy selection scorecard
Stimulating creative thinking
In the continuity strategy development framework presented in figure one, the most complicated is Phase B ‘recovery options identification’. Here, for each recovery requirements identified in phase A, new ways of doing things have to be identified. For this phase to be successful, creative supporting techniques have to be used effectively to enhance the creative thinking of the individuals working on the development of the business continuity strategy.
To put into action the framework for developing the BCS, a group needs to be chosen and workshops should be facilitated to develop the BCS framework. A very important concern is the group composition. “Heterogeneous groups are more creative than homogenous groups.” (Prince, Krug: 2012) The number of people is also very important. “Groups of three to five people perform better than individuals when solving complex problems”. (Franz,2012))
Here it is important to be clear that two ingredients are necessary; one is to understand the steps in the framework for developing the BCS, and the other one is to facilitate the environment for creative thinking to flourish. Creativity “involves the generation of new ideas or the recombination of known elements into something new, providing valuable solutions to a problem,” (Cyzewski, 2012). The person acting as a facilitator plays a very important role, he or she needs not only to be knowledgeable of the technical concerns with the BCS framework, but also needs to manage different creativity supporting techniques. The main objectives of a “creative thinking process is to think beyond existing boundaries, to awake curiosity, to break away from rational, conventional ideas and formalized procedures, to rely on the imagination, the divergent, the random and to consider multiple solutions and alternatives,” (Michalko, 2011). Some of the creativity supporting techniques that could be used with the BCS groups to foster the creative thinking process are:
- Brainstorming: this is one of the best known and most used group based creativity processes for problem solving. It is a method of getting a large number of ideas from a group of people in a short time.
- Story boarding: this is a creativity technique for strategic and scenario planning based on brainstorming and used mainly by groups.
- Lotus blossom: this technique can also be used in scenario planning and is very useful for forecasting strategic scenarios. It is designed for groups and is used to provide a more in depth look at various solutions to problems.
- Mapping process: the use of maps is particularly useful in strategic management thinking in organizations, helping to organize discontinuities, contradictions or differences, and bring pattern, order and sense to a confusing situation, acting as a spatial representation of a perspective.
- The excursion technique: a very useful technique for forcing a group to have new thought patterns to formulate strategies.
- Nominal group technique: a structured method of group decision-making, which allows a rich generation of original ideas, balanced participation of all members of the group, and a rank-ordered set of decisions based on a mathematical voting method. It is an excellent technique for enhancing group creativity.
There are many supporting techniques to help teams develop creative thinking. The recovery options in Phase B need a very creative environment. The main issue is not to replicate the process, but to be creative enough to identify alternative ways to deliver the process output. Here, an eclectic approach to creativity is recommended.
The development of a business continuity strategy needs a methodological framework to guide its steps. In this article the BCS framework has been presented. The sequential of steps need to be followed. No step can be skipped.
It is very important to use the workshop style for putting into action all the steps of the BCS framework. The composition of the group is important. The team that is going to create the recovery options is made of individuals that are knowledgeable regarding the recovery requirements and is familiar to the threat scenario.
The team should not be too numerous, five people is good enough.
The management of the steps involved in the BCS framework is very structure and mechanical. The most difficult part is the ‘identification of the recovery options’, this is a very dynamic and an unstructured approach is needed to be able to foster creative thinking in the team. Here the role of the facilitator and the correct management of support creativity techniques are fundamental.
Alberto G. Alexander, Ph.D, MBCI International Consultant
Dr. Alexander holds a Ph.D from The University of Kansas and a MA from Northern Michigan University. He is the Managing Director of the international consulting and training firm “Eficiencia Gerencial y Productividad”, located in Lima, Perú. He is a member of the Business Continuity Institute.
- Hiles, Andrew. The definitive Handbook of Business Continuity Management. Third Edition. Wiley, 2011, London.
- Prince, George, Krug, Steve. The Practice of Creativity: A Manual for Dynamic Group Problem Solving, 2012. Mc Graw N.Y.
- Franz, Timothy. Group Dynamics and Team Interventions: Understanding and Improving Team Performance, 2012, Free Press, N.Y.
- Cyzewski, Ed. Creating Space: The Case for Everyday Creativity. Penguin 2012, San Francisco
- Michalko, Michael. Creative Thinking: Putting your Imagination to Work, 2011. Mc Graw, N.Y.