How to define your recovery time objectives
- Published: Friday, 20 February 2015 09:53
By Charlie Maclean-Bristol FBCI FEPS
Defining the recovery time objectives (RTO) for your activities is one of the most critical things the business continuity manager will carry out. Get them wrong and the whole basis for your recovery strategy is flawed. Often, rather than being an objective assessment, the RTO is driven by internal politics and by managers wanting their part of the organization (and hence themselves) to be seen as important.
For a long while I have wondered if there was any scientific way, or even a rule of thumb, for defining your RTOs but I have never come across one. A while ago I reached out to the BCMIX LinkedIn Group to ask how members went about defining their RTOs. I got lots of explanations of the process for defining them but no set rule. Most people said that defining RTOs was a combination of common sense, knowledge of the organization, and experience. These are all very good but how is a beginner going to get that experience?
In the absence of any set method of defining RTOs here are my thoughts on the subject:
The first step for defining the RTOs is to define the ‘maximum tolerable period of disruption’ (MTPD) for the activity in question. For those of you not familiar with MTPD, it is a term used in the Business Continuity Institute’s Good Practice Guidelines 2013, which defines it as “the time it would take for adverse impacts, which might arise as a result of not providing a product/service or performing an activity, to become unacceptable”. There is an equivalent concept outlined in ISO 22301. This can be found in section 8.2.2 where the standard asks you to assess the impacts over time, as part of the business impact analysis.
In setting the MTPD you are looking for the “…duration after which an organization’s viability will be irrevocably threatened....”
I think that many people struggle with the concept of the MTPD and for most organizations it is very difficult to define precisely. There are very few organizations that could survive 30 days but not 31 days, for example.
On a slow news day an organizational incident could be number one item in the news and the company’s reputation may be retrievably damaged; while on another day the same incident might not even be picked up by the media.
There is also the issue that some organizations, especially government organizations, may not be allowed to fail; or when a small part of a multinational fails it is very unlikely to cause the demise of the whole organization.
So, in defining your MTPD, we are looking for a ‘ball park’ time at which the organization's viability is irretrievably threatened.
As we can see above, ‘irretrievably threatened’ means different things to different organizations. Before defining the MTPD it is important to agree what the criteria for the organization would be. These could range from:
- Reputation being irretrievably damaged and the company being unable to sell to new customers;
- Running out of money and the organization going bankrupt;
- The entire management team being replaced, or the organization being closed down or run by another part of the organization;
- Failure of delivery of service to customers resulting in existing customers leaving the organization and being unable to attract new ones;
- A genuine chance of the organization killing or injuring a member of the public.
The importance of these criteria is that they have to be tailored to the individual organization.
The MTPD of an activity is an estimation that should sit within a rough time frame, say one-three months or two days - two weeks. Depending on the type of organization I normally define six or seven timescales for the MTPD to fall within. For example, for an ‘office based organization’ which typically works 9am to 5pm five days a week, I would use the following timescales to see where the MTPD would fall:
- 0-24 hours
- 24 hours to 3 days
- 3 days to 1 week
- 1 week to 2 weeks
- 3 weeks to 1 month
- 1 month to 3 months
- 3 months+
For an organization which works 24/7, such as a hospital, I would change the timescales to perhaps:
- 0-1 hours
- 1 hour to 6 hours
- 6 hours to 24 hours
- 1 day to 3 days
- 1 week to 2 weeks
- 4 weeks to 1 month
- 1 month+
Even if the MTPD of an activity is a rough estimation, we can still use the MTPD to differentiate between different activities. Some activities will have a longer MTPD than others and this starts to give us an indication of which activities are the first to be recovered and which can wait a while.
It is important that, if you are going to use MTPD to define your RTOs, you use the same criteria each time to define your RTO. If you use different scenarios for ascertaining the MTPD for different activities then it will very much skew your findings. I use the phrase “if the entire activity was to stop”, disregarding any particular scenario when the failure of that activity led to the organization being irretrievably damaged.
Once we have the MTPD of all our activities, we are in a position to start to estimate our RTO. I say here 'estimate' as the RTO may change slightly after the design (strategy) stage of the business continuity lifecycle. In looking at an appropriate strategy to recover the RTO for practical or financial reasons you may have to adjust your RTO. In deciding on the RTO we know that the RTO must sit somewhere on a timeline before the start of the incident and the MTPD. See below:
If the RTO is too close to the incident then we might be recovering the activity too quickly and therefore wasting money. As an extreme example: an activity which has an MTPD of one month+ does not need an RTO of two hours. On the other hand if the RTO is too close to the MTPD then the RTO might be missed and there is no time for recovery before the MTPD; and so the organization may be irretrievably damaged.
I have come up with a rule of thirds for defining the RTO. The RTO of our activity sits in the middle third of the time line. As shown by the green area in the figure below.
By way of an example, as shown in Figure three (below), the MTPD is in the time frame of three weeks to one month. We take the lowest time of the MTPD timeframe (three weeks) and then divide the timeline into three parts. The RTO sits somewhere between one week and two weeks. It is then up to your judgement whether is sits closer to 1 week/8 days or closer to the 2 weeks/12 days or if it should sit in the middle.
If the MTPD of a different activity were within the time bracket of six to 12 hours then the RTO would be sometime between two and four hours.
Once we have decided on the RTO then we should go forward to the ‘design’ stage of the business continuity lifecycle. Within the design stage we look at, and then agree, the recovery strategy for that activity. It must be noted that the RTO may change in the design stage as it may have to be adjusted to make a particular strategy work. Once the RTO has ‘gone firm’ I believe that all the RTOs of the activities across the organization should be signed off by a senior manager. This to check that they meet the requirements of the organization and are acceptable to the senior managers of the organization.
I have taught the rule of thirds on a number of training courses and I would be interested if there are any comments on it.
Charlie Maclean-Bristol FBCI FEPS is director of PlanB Consulting.