More than a plan: establishing a disaster recovery program
By Glen Bricker
Many organizations think that having a disaster recovery plan is all the protection they need from disasters. However, there is so much more to disaster recovery than just a plan! That’s why most industry professionals see disaster recovery as an ongoing program or process that contains a number of distinct elements.
Key process activities include:
- Business engagement and establishment of business requirements (through business impact analyses and risk assessments), resulting in the definition of recovery time objectives, recovery point objectives, and downtime procedures (manual workarounds;
- Identification, evaluation, and selection of appropriate recovery approaches to achieve business requirements, including defined ongoing budget commitments and staff allocations;
- Development of plans for technical recovery and coordination of the recovery effort;
- Execution of ongoing exercising and training.
In addition to process elements, the following governance activities are also typically performed:
- Management engagement through recurring steering committee meetings (management reviews);
- Formalized, recurring planning activities documented in governance documentation;
- Corrective action tracking and prioritization, as well as post-incident reporting and analysis.
These process and governance activities, taken together, represent leading practices for a disaster recovery capability. However, it’s important to note that every organization is different and may require different levels of maturity and formality of the above activities.
In addition, the ideal IT disaster recovery capability is also one that supports a broader business continuity program that addresses aspects of recovery beyond IT, such as a facility, personnel or supplier loss.
While most organizations start with IT disaster recovery, the goal is ultimately to address business continuity as well. As a result, building your IT disaster recovery program so it will align effectively with an eventual business continuity capability is another key consideration.
There are multiple standards and methodologies that provide guidance for establishing an IT disaster recovery program. Two of the more comprehensive are ISO 27031 and ITIL IT Service Continuity Management (ITSCM). This article will explore the components in ISO 27031 and ITSCM, their similarities and differences, and their relationship to business continuity management.
ISO 27031:2011 – Information and communications technology (ICT) continuity management, developed originally by the British Standards Institution (BSI), was accepted as an ISO standard in 2011 and represents a management systems-based implementation of an IT disaster recovery program. It has six key principles:
While ISO 27031 is intended for use in the larger context of a business continuity program, organizations have successfully implemented this standard and then later grew into business continuity.
Structured as a management systems-based standard, ISO 27031 has two main components: the management system and the process. The management system is intended to ensure that an organization has a documented process to execute ICT continuity management. It utilizes the plan-do-check-act (PDCA) cycle consistent with ISO and other management system based standards. The process details the necessary components to provide the recovery capability. While the management system described in ISO 27031 can be established solely for IT disaster recovery, there are elements of the process that assume the existence of an overall business continuity program. As you can see below, ICT requirements are established by business continuity requirements typically determined during a business impact analysis.
The process of developing, maintaining, and improving an ICT capability are defined as five high level components:
- Understanding the ICT requirements for business continuity – with the purpose of determining the ICT continuity services needed to support the business continuity requirements. The process requires understanding the components of critical services in production, their current continuity capability and the gap between current capabilities and business continuity requirements. The analysis should also focus on actions that can be taken to improve the resiliency of the production environment;
- Determining ICT continuity strategies – with the purpose of developing both an overall ICT continuity management strategy and strategies for each critical ICT service that closes gaps identified during the previous phase;
- Developing and implementing ICT strategies – with the purpose of implementing the chosen strategies, including establishing the necessary organizational structure, plans and procedures;
- Exercising and testing – with the purpose of ensuring that the strategies and plans work as intended;
- Maintenance, review and improvement – with the purpose of ensuring that ICT continuity strategy remains current and appropriate.
For those familiar with BS 25999-2:2007, the business continuity management standard, the structure above is consistent with sections four through six of that standard.
Given the similarities to BS 25999, ISO 27031 is the logical choice for implementing a disaster recovery capability in organizations that either utilize BS 25999 for business continuity or have other management systems-based programs. It also provides solid guidance for organizations that have no business continuity or other structure in place to serve as a basis for disaster recovery development. Establishing a management system as part of an ISO 27031 implementation will provide the necessary governance and provide a platform for the development of a more comprehensive business continuity program.
Many organizations have adopted ITIL IT Service Management (ITSM) as the model for the operation of their IT function. Within ITSM are multiple processes that set standards for the design and operation of IT services. At the core of ITIL is Service Design. As can be seen from the diagram below, Service Design includes multiple disciplines, one of which is Service Continuity Management (ITSCM).
ITSCM, much like ISO 27031, is composed of multiple processes intended to ensure that disaster recovery is established, implemented, and maintained over time. While not a management system, it provides similar requirements, especially around testing, review, and continuous improvement.
As with ISO 27031, ITSCM assumes the existence of a business continuity capability as the source of business continuity requirements. The core elements of ITSCM are described below:
- Design Services for Continuity
- ITSCM Support
- ITSCM Training and Testing
- ITSCM Review
The two approaches identified above represent the most common methodologies for building a strong program. If based on legitimate business requirements, adequately funded and staffed, and updated on a regular basis, either ISO 27031 or ITSCM will serve as a solid disaster recovery model. Choosing which to use should be based on the culture of the organization and existing processes that can be leveraged. If management systems exist in other disciplines, or no formal structure exists, ISO 27031 would be a good choice because it includes a governance model as part of the standard. If an organization has adopted ITIL to guide overall service management ITSCM seems the natural choice. Caution should be taken, however, to ensure that the program has visibility and support outside of the IT organization. Without it, disconnects between business requirements and IT strategies often occur.
While interesting, I felt Glen Bricker’s article on the relative merits of ISO 27031 and ITIL in developing and managing an ICT continuity capability was a little wide of the mark in a number of areas.
Secondly – and I don’t want to appear pedantic here BUT – the standard is not entitled “Information and communications technology (ICT) continuity management”. It is “Guidelines for information and communication technology readiness for business continuity”. Now that’s an important point because in developing the standard a great deal of effort was made to emphasise the role of BCM in defining what ICT readiness capabilities had to be. In other words it’s the BC objectives that drive ICT readiness and not the other way around.
I make this second point because Mr Brisker asserts that, “While ISO 27031 is intended for use in the larger context of a business continuity program, organizations have successfully implemented this standard and then later grew into business continuity.” This approach which implies defining the ICT continuity approach first and then bolting on BCM as an afterthought, is precisely what ISO 27031 discourages. Indeed it is that approach which has resulted in so many IT continuity and DR plans being completely divorced from the requirements of BCM or demanding that the latter be forced into a form totally unsuited to the wider organisation’s needs.
ISO 27031 states that, “As part of its BCM program, the organization will have categorized its activities according to their priority for continuity (as determined by a Business Impact Analysis) and defined the minimum level at which each critical activity needs to be performed upon resumption….” (My emphasis). So it’s quite clear that BC requirements are the starting point for defining what ICT readiness should look like.
Let’s be clear ISO 27031 complements and supports business continuity but it is not a business continuity standard and to use it as the starting point in the development of a wider BC programme is very much a case of the tail wagging the dog. And that’s something that’s invariably a bad idea.
Ron Miller MBCI (Co-editor of ISO 27031)
•Date: 18th October 2011 • Region: US/World •Type: Article • Topic: ICT continuity