Why business continuity needs to change…
- Published: Thursday, 15 March 2018 09:32
Martin Caddick, FBCI, recently retired from PwC. In this article he reflects on the experience gained in managing teams that delivered well over a thousand business continuity projects. Martin looks at what isn’t working, and what needs to change to improve business continuity.
I cannot help commenting on just how reactive businesses and government are to disasters. This is NOT a good thing. Collectively, we are not proactive or structured in how we plan our business continuity. It seems to me that where we are today with business continuity is the result of a series of reactive responses, which has left us with capabilities that are unbalanced and inadequate.
Over the years my teams delivered well over a thousand business continuity projects for many hundreds of businesses. I’ve seen a lot of different variations of good and bad business continuity, and I’ve seen a lot of change.
In this article, which is based on a webinar I delivered earlier this year, I reflect on all that experience, and try to convey what I think isn’t working, and what needs to change to improve business continuity.
How our history has shaped business continuity
I want to start by looking backwards rather than forwards. If you want to change things for the better, you need to think about where we are now, and how we got here. What has happened in the past shapes people’s attitudes and perceptions relating to business continuity.
Business continuity started to develop from the 1970s in different countries in different ways and for different reasons. For example, countries like Japan and New Zealand developed plans for coping with earthquakes. Businesses in areas prone to flooding tended to have flood plans. Fire regulations encouraged businesses to have evacuation plans. But these were really incident management plans (and perhaps some restoration plans) and tended to be linked to facilities management. They were not true business continuity plans.
It was the ‘Year 2000 bug’ that really changed everything. Year 2000 was indiscriminately global and something that everyone had to take seriously. Year 2000 highlighted a couple of key things:
- Firstly, the level of dependency that business has on IT – and it is not just business, but also the rest of us. We all rely on IT – for drawing out cash, shopping, and Internet services.
- Secondly, it highlighted the need to prioritise. With Year 2000 you couldn’t fix everything and didn’t want to fix everything. You needed to concentrate on what was important.
A good legacy of Year 2000 was an approach – the BIA (the business impact analysis) – which prioritised our efforts and also systematically inventoried what we needed to fix. Another important legacy was the acceptance of the need for contingency plans – what you do if IT (or other service) is not available.
A bad legacy of Year 2000 was that it placed business continuity too close to IT, with IT often still retaining responsibility for business continuity.
Even before Year 2000, things had started to change in the UK when the IRA bombing campaign in the 1990s switched from military targets to economic targets, with attacks like those on the Stock Exchange, the Baltic Exchange, and Canary Wharf. London businesses – especially in the financial services sector - started to plan for keeping their business running in the face of direct damage caused by bombs, and indirect disruption, such as caused by exclusion areas (as opposed to just managing the incident).
The lessons learned by London as a result of IRA activity were rarely applied elsewhere until the 9/11 attack on the Twin Towers – which was similar to the London IRA bombs except much greater in scale and in global impact. Since then, we have seen a steady stream of terror attacks in cities across the world - from Paris to Moscow, from Sydney to London.
The impact of these attacks is that any business based in a large city recognises the need to have plans that cope with ‘denial of access’ situations – meaning they need plans for back-up locations or working at home. They also have to plan to work alongside emergency services in a command and control structure.
The Bird Flu scare of 2005, and Swine Flu in 2009 never materialised to the extent feared, but they did highlight to businesses that they rely just as much on people as they do on IT and premises. So as a result, HR and health & safety departments became involved to a greater extent, and BIAs were tuned to identify critical individuals or groups.
Devastating storms, such as Hurricane Katrina in 2005 and Superstorm Sandy in 2012, reinforced traditional business continuity needs: workplace recovery, restoration, supply chain – with the added ingredient of looking after your workforce.
What has been seen in more recent storms, such as Superstorm Sandy, is the growth of social media as a means of keeping track of staff and, indeed, of communicating with them.
Many businesses found that Facebook and Twitter proved the best ways to check whether staff were OK, and for coordinating arrangements for them. But the possible security and privacy implications of this had not been thought through, and in the aftermath of the storms the more sensible businesses started to do so. This reinforces the need for the involvement of HR again, now alongside security and communications staff.
The Financial Crisis of 2008 presented business continuity with different challenges. Few leaders in the financial sector saw the crisis as being in any way related to business continuity. However, the regulators certainly saw the crisis in terms of resilience. I know my contacts at the Bank of England were very worried that the failure of an institution (like Leemans) might cause major sector-wide disruption. Regulation was introduced to ensure that operations that were critical to the sector were able to carry on business even if their parent businesses failed. This required Recovery and Resolution Plans (RRPs), or so-called living wills, to ensure this.
The process to build RRPs was very similar to business continuity, and could have benefitted from business continuity expertise. But the work was usually done by teams specifically set up to create the living wills and they often ignored the existing business continuity teams.
This is an example of a huge problem faced by business continuity professionals: we are not seen as strategic or high-powered; if indeed we are thought about at all.
The financial sector is still left with an unresolved situation. One where greater resilience necessitates a better, more joined up, approach and remains a real concern in the minds of regulators. But efforts to improve resilience are STILL seen too often in very operational terms – in other words, mainly in terms of IT, security, and compliance - and not in more strategic terms, such as culture or decision making, which is where the root problems may actually be.
Finally, we have cyber. This takes us back to IT again, with an overlay of security. The response to the cyber threat has been high profile and expensive, driven often by national security agencies. The extent of the problem (few businesses escape attack) makes it a board issue. It has brought crisis management and communications to the fore, and demonstrates how the executive response is critical. If you track the share value of firms dealing with a cyber crisis, you can clearly see how share price is directly affected by the words and actions of the leadership.
This is a useful lesson, but cyber distorts other forms of resilience. We risk overdeveloping one form of defence in reacting to one threat (cyber in this case) and missing other threats and remedies that are not currently top of mind.
The shortcomings of business continuity in practice
Business continuity is reactive. Business continuity doesn’t get executive buy-in unless something is seriously wrong, and then the focus is on the current issue (like cyber) to the exclusion of building more comprehensive, consistent, and balanced capabilities.
I have grouped some specific short-comings – the legacy from the past - under three interrelated headings – governance, image, and approach:
1) Ownership of business continuity (and resilience) within the business varies widely
Where does business continuity fit?We have seen that business continuity sometimes sits under IT, sometimes risk, or security, or compliance, or internal audit, or facilities management, or HR, or health & safety. There is no conventional place to put business continuity, and it suffers from a degree of bias depending on where it sits, because the people who manage it see things just from their own perspective.
An added danger is that business continuity can suffer from corporate politics. Instead of focusing on how to work across protective disciplines, poor governance leads to the protective disciplines squabbling amongst themselves for budget and for influence.
2) Business continuity is not a board concern
Boards are bored by business continuity. It’s never interesting to them unless something goes wrong, and then their interest only relates to what has gone wrong, and who to blame; and sometimes the wrong lessons get learnt.
3) Business continuity is managed at too low a level in the organization
Business continuity managers are middle managers and do not carry enough clout. They are not consulted about big projects or strategies, and resilience is often an afterthought. I have seen clients postpone major SAP implementations at the very last minute, because they realised that should the implementation be delayed or fail, they had no business continuity or contingency plans. Had business continuity been included as part of the project risk assessments that could have been avoided.
4) Lack of consistent investment
Investment is patchy. Business continuity as a function is too vulnerable to cuts and redundancies, being viewed as non-essential: and expenditure tends to focus on specific issues that are top of the mind (like cyber, and like all the examples in the history of business continuity) rather than on creating an overall capability.
1) Business continuity is seen as a dull cottage industry
This is often our own fault – we report too much detail, and we report on our activity and progress rather than how resilient the business actually is. Our tale of woes about resourcing can quickly make it seem like we are building up mini-empires of people who do not seem to contribute anything to the profits of the business.
2) Business continuity is seen as operational, not strategic
Many colleagues want to see business continuity stay in its box – they ask, what does it have to do with strategy? The answer, of course, is potentially quite a lot. A resilient business can take more risk, and a good understanding of resilience means you make better business decisions and get greater rewards. Try explaining that to the CEO or to the head of strategy!
1) Our plans tend to relate to offices and not business activities
This is a legacy of threats against locations (like terrorism or extreme weather). This shortcoming causes a strong tendency to dumb down business continuity because we plan based on operational things, rather than strategic priorities. My teams found that the failure to match plans to business needs was one of the most common business continuity review failures.
2) Too much focus on compliance
Business continuity planners tend to turn their approach into a repeatable process – a methodology – based on standards, so they can meet compliance requirements. This takes the emphasis away from intelligent decision-making at the point of disruption and in its aftermath. As a result, we over-plan and obstruct decision making in a crisis, rather than helping it.
3) The approach to planning is reactive rather than proactive
We typically ‘shut the stable door after the horse has bolted’. This is partly driven by a blame culture – “we must make sure it never happens again” - rather than by a proper reflection on what can be learnt, and by a balanced approach to all possible scenarios.
4) Lack of integrated approach or response
Business defences are divided across many functions (HR, facilities management, IT, communications, security - as well as risk and business continuity). Too much time is needed to communicate and coordinate without clear direction, both in a crisis and during the planning process.
Trends that make a difference
Where we are, is not good enough. But we also need to look forward to understanding what we are contending with in the future. We need to understand which trends are making a difference and are likely to continue to do so.
I’ve picked just three mega-trends that stand out to me. But we all need to think this through for ourselves, and identify what is going to change in our own sectors and geographies:
Globalisation and new technology
The first of my picks is, or should be, very familiar, but it is still worth highlighting. Business is increasingly based on a mix of high-speed connectivity, ‘big data’, the ‘internet of the things’, and industry 4.0 (1).
I don’t propose to describe all the technology trends in further detail, but the result is that businesses sit within increasingly complex inter-dependent systems.
When something goes wrong in complex systems, the effects of contagion multiply the impact many times over. In addition, you often have no management control over multiple supply sources and services that your business depends upon - for example, if you have outsourced your IT or a business service.
When you outsource, you pass management control to someone else, but the risk is still yours, whatever your contracts say. I have seen clients held responsible by regulators for the failure of an outsourced service provided by a third party, contracted to a business partner contracted in turn to our client. Our client had no visibility and control over this, but the service was in their name and the regulator held them accountable.
The next point is really important to understand. Because of all the interdependencies, complex systems are far more vulnerable to completely unpredictable risks – to so-called black swan events. Such risks in complex systems are foreseeable (in retrospect), but completely unpredictable both in terms of likelihood and in terms of impact. Because of this, many traditional risk management approaches, which rely on likelihood and impact, simply do not work.
A change of approach is needed to prepare for the unexpected and the unpredictable. An updated approach should have at least the two following features:
- An approach for working with partners in the network to understand each of your responsibilities, and the limits to which that extends. This means pooling your knowledge. This means joint planning.
- An approach that focuses less on specific risks and more on the capability to respond, come what may. This means an ability to make well-informed on-the-fly decisions, instead of relying on predefined detailed plans.
It is not that detailed planning is wrong: good planning enables good decision-making, but is not a substitute for it.
When we talk about technology advances, this is not all bad news for resilience. Big data and AI in particular, offer the opportunity to identify indicators and trends early, and to respond early, to prevent a disaster occurring. The best analogy for this is an autopilot that monitors all the controls and status of an aircraft, stops it stalling, and maintains trim, leaving the pilots to deal with executive and completely left field events. That’s what 21st century resilience systems should be doing.
The second of my picks relates to the balance between relatively random accidents and deliberate targeted acts that cause disruption.
Both have always existed, and I doubt that there are proportionally more deliberate acts than there used to be. But I think that business continuity, and indeed the risk industry, has historically focused on the accident side. Floods, fires, IT and utility failure are all largely accidental and untargeted, yet are collectively relatively predictable – both in terms of likelihood and impact. The insurance industry covers these well. Without being complacent (because these are, still, very real threats), they are comparatively obvious and we have tried and tested responses.
By contrast, fast evolving interconnectivity and technology has attracted the interest of criminals, activists, and governments as opportunities to target specific businesses. This sort of disruption is, by its very nature, initially hidden from us; and it is designed by the perpetrators to exploit weaknesses in our ability to recognise it and respond. These threats are less predictable and we have to find new ways to protect ourselves, to detect the attacks, and to respond.
The prescription for dealing with this is the same as before: that is:
- Focus on the capability to respond, come what may;
- Work with partners in the network - joint planning; plus
- Use big data and AI to automate early warnings and response.
Regulation and expectation
There’s a negative cycle of reaction in the aftermath of a disaster – whether it is something like the Grenfell Tower fire, or the 2008 financial crisis, or terrorist attacks.
- Something goes wrong, and there is an outcry, and usually something of a witch-hunt … “Who should we blame?”
- Politicians and leaders say …“This must never happen again!”
- Regulators and legislators draw up new rules that mandate approved behaviours and processes, but these often go too far and even become counter-productive.
- Then everyone becomes obsessed with compliance to the new regulations, rather than making sure that next time the right decisions get made at the time.
I believe that this cycle makes future incidents more likely instead of less. We are taking away the element of judgement and replacing it with rules. Most serious incidents I can think are caused, or made worse, by:
- Failures to follow process, or the blind following of process, because people have switched off their minds;
- Complacency induced by inaccurate or inadequate reporting (often driven by fear of reporting non-compliance).
My team investigated some serious disruptions and found that on the one hand a key part of the problem was ‘false green’ reporting because people were afraid of telling the truth, and on the other hand, worse problems were averted because a few individuals made a number of brave decisions that were not in the rule book.
The great problem of our times is too great a focus and reliance on compliance – compliance to regulation is not resilience.
Many regulators have a more sophisticated understanding of the issues than we sometimes give them credit for. There is a reluctance to add unnecessarily to existing regulation, especially as whenever regulations are introduced, businesses fixate on compliance with the letter of the law, rather than taking on board the objective of the regulation, to the detriment of their decision making.
Some regulators have preferred a question led approach rather than explicit regulation – they try to drive accountability and thoughtfulness within the business by requiring the leadership to respond to questionnaires to explain their approach to resilience. But trying to guide our response in this way is like trying to thread a needle wearing boxing gloves and welding goggles. I suggest building a constructive dialogue with any relevant regulators, and not seeing them as the enemy.
We need to note the power of social media nowadays. It was initially, perhaps, an incredibly powerful way of exposing the truth, but is now an equally powerful tool for disinformation used to feed the fears and prejudices of the credulous, and it can work against you even if you have done nothing wrong. When you add malicious intent into the mix– such as Russian fake postings on Twitter following the Manchester bombing, and you get a very unpredictable and combustible situation, which, when applied to a business, could quickly spiral out of control for you.
How this will evolve I can’t tell, but we need to understand that it acts as a multiplier in terms of impact.
We can see that regulation and legislation is often driven by public reaction – which evolves far more quickly and unpredictably now, with social media. In general terms, the first time an incident happens, there is a degree of understanding and forgiveness – an attitude of “there but for the grace of God go I”. But if it is not the first time it has happened, that understanding can evaporate very quickly, because expectations are higher.
What can we do about it?
If we are agreed that we are not where we want to be, then what can we, as business continuity professionals, do about it?
I think that, broadly there are two courses of action:
Firstly, if you believe that our role and influence is constrained, and is going to stay that way, and that we will never have the opportunity to solve the bigger issues (and nor should we) – then the question is how can we do what we already do better and in a more relevant way?
Secondly, if you believe that we can redefine our roles, or persuade our leadership to redefine their approach, then what can we do that is different?
Doing it better
This course of action is particularly for those of us who believe that business continuity, professionals are not, nor will ever be, anything more than a specialised risk mitigation function: those who believe that we just need to make sure that we play our part well. But it is not limited to these people - it applies to all of us.
Bearing in mind everything we’ve looked at before, I think that there are four areas we should be concentrating on. These are:
- Planning for the unexpected
- Raising the profile of business continuity
- Working better with others
- Getting our priorities right.
What does this mean in practice?
- Anticipate disruption better: we can do this by improving horizon scanning. This means picking intelligent key indicators, gathering intelligence from other departments who are likely to see emerging problems before we do, and also taking time out to think about what is going on now and in the future – maybe by team brainstorming sessions or better still, brainstorming with other departments and partners.
- Focus on building capability: Just following a process that ends up with detailed plans is not good enough. We depend on the capability of people to understand their roles in a crisis, and who are able to make good decisions in a crisis. That means training and exercises, using scenarios that are different, and challenging assumptions, and this is more important than detailed plans.
- Improve our reporting: Our reports should be seen at the Risk Committee level at least. To do that, and to stay on their agenda, we need to make sure that our reports are relevant. We need to avoid excessive detail. The reports need to be relevant to business operations. Most of all, the reports must give a true picture of the ability of the business to withstand and recover from a crisis, rather than being some kind of programme progress report.
- Talking about making our reports relevant to the business - that means having conversations with the business and trying to understand their issues and challenges, and adapting our approach so that it is more relevant to them. The business is a great source of war stories that we need to capture, including where business continuity has helped them. We need to avoid scaremongering, which can work in the short term, but undermines your credibility in the longer term. During both Year 2000 and the Bird Flu scares, the media seized on the very worst-case scenarios. People responded, but the worst cases obviously didn’t happen. As a result, a great many people regard both threats as being largely false alarms, and are mistrustful of anyone who sounds to them like doom-mongers.
- If you want to remain relevant to the business, then a priority is to build and maintain as good networks as you can, and good working relationships in the firm; especially with areas of the business that may be called on, at short notice, in a crisis. Communications is a particularly important area in view of what I have been saying about social media.
- You should not take on responsibility for building all the plans and keeping them up to date. Quite apart from the impossible amount of work you end up with, this stops other people in the business from thinking about business continuity and internalising it, which in turn means that they don’t recognise business continuity situations quickly enough when they arise. Instead, you need to aim to embed business continuity into their operational procedures where possible (making people responsible for, and holding them accountable for, their own business continuity).
- Finally, you need to get away from relying a department by department analysis. Use Strategic Business Impact Analysis (SBIAs) to understand the overarching business priorities - based on client related business processes (and not parochial departmental priorities, like reporting requirements and internal Service Level Agreements). Make people think, rather than fill in forms.
Doing it differently
This course of action is about raising and changing the role of business continuity professionals, and taking a broader, more strategic, role. This is more of an evangelical path, and it is a path to follow if you already have influence with the decision makers in the business; or if you ever get the opportunity to influence the decision makers.
It may not be as difficult as it might initially appear, because the need to do this is what is driving the emergence of resilience as a theme in the market. But it is vital to understand that resilience is NOT just a rebadged form of business continuity.
Nor is resilience just about the various protective disciplines working together. It is well worth reading the British Standard BS65000 on Organisational Resilience, and also getting the ISO standard 22316. The BS65000 is perhaps slightly more comprehensive than the ISO and is aimed at informing decision makers – it’s an educational document aimed at decision makers as much as anything else. The ISO is slightly narrower, but it does represent a consensus view on resilience from across the world, and it will be the de facto international document on the subject.
There are three areas that I have picked out to concentrate on:
Developing an integrated and networked approach to resilience
- Business continuity needs to work effectively in conjunction with the other protective disciplines as part of a coordinated approach to operational resilience. This means that there needs to be shared governance – not necessarily shared reporting lines – but protective disciplines need to be accountable to a single point, probably the Risk Committee. Decisions need to be made, taking account of the plans and priorities of the other protective disciplines, working together wherever necessary.
- To do this, you need to have a shared view of what matters most. The strategic BIA (SBIA) (together with risk appetite) are the best mechanisms I’ve seen for achieving this. So your SBIA needs to be developed into a tool that can be used across protective disciplines to identify what really matters in a firm, and why – this should be a consistent basis for prioritisation and for investment.
Embedding resilient thinking into corporate strategy and culture
- We need to aspire to have more strategic influence. We touched on this earlier, but we are taking it further here. We need to develop an understanding of what else makes an organization more or less resilient. So, we need to understand the role of culture, of innovation, of governance, or how to improve the organization’s situational awareness and so on. These are not protective disciplines. This is as much to do with what the organization is, as it is with what the organization does. This is the basis for organizational resilience as defined by ISO22316 and BS65000.
- Following on from that, we need to develop and report on key indicators and trends related to organizational resilience. The truth is that we might develop an excellent understanding of resilience ourselves, but unless you can find ways to report on this in meaningful ways, your understanding will not be shared by the business. Perhaps the single most useful thing we can do is to have a meaningful resilience dashboard that we share with the executive.
Making use of new technologies
Emerging Big Data technology allows us to gather and analyse the vast amounts of data that the organizations collect. We know that buried in that data will be all sorts of indicators and trends that point to emerging threats, and to the general health and resilience of the business. If we can apply Artificial Intelligence technology to this we should not only be able to get some early warning of emerging issues, but also to automate some pre-emptive actions that may help block, or contain, any damage.
I think that this could be a game changer. It’s like my earlier analogy of an auto-pilot, which can automatically prevent stalls and deal with any number of situations and corrections that the pilot may never need become aware of, leaving the pilots free to try to understand the big picture and fix meta-problems.
I think there is a great deal for us to contend with. It is clear to me that where we are is not good enough, and we need new, and better, approaches.
Yet I do believe also that business continuity has achieved a great deal. If you believe that we keep on with our existing remit – that we stay in our box – then there is still a lot more that we can do to improve. It starts with being more proactive and less reactive.
However, if we chose that path, we will be like insurance, or health & safety. That is, although important and necessary, we will remain a backwater, a specialist area, rarely seen at the top table. Maybe that’s OK, and it is probably the path that most businesses will go down.
Personally, I think that business continuity offers the business more than that – because uniquely we look across the whole business and we try to determine what matters most and work out how to keep that going. That makes it more than, and different from, health & safety, or insurance, or security, or compliance for that matter.
I believe that the concept of resilience is an idea whose time has come – there is a broad recognition in the market of a need to coordinate protective disciplines better and to address some of the things that make us more resilient that are not to do with protective disciplines, like culture and behaviour.
If you believe that we can and should be more to the business, then this second path is the one to choose. We have an opportunity to play a leading part in making businesses more resilient, and I think we should try to take it.
Martin Caddick is an independent adviser on business continuity and resilience, and has worked with many organizations of all sizes, from global corporations to small local businesses, helping them to build sustainable resilience through a pragmatic and flexible approach. He is a thought-leader on resilience, contributing to the standard BS65000 on organizational resilience and to the 20-20 Strategy ‘think tank ‘ of the BCI.
He has recently retired from PwC, where he built and led its award-winning resilience team, covering organizational resilience, business continuity, crisis leadership, and IT Disaster Recovery. Before that he ran Marsh’s business continuity service, having cut his teeth building the Year 2000 services for Amdahl/DMR.
Martin is a Fellow of the Business Continuity Institute and a member of the Institute of Operational Risk. He was a member of the advisory board for the Resilience Network of London First, and is a regular public speaker and media commentator.
Contact Martin at firstname.lastname@example.org
(1) Industry 4.0 is a term coined by the German government, and is sometimes referred to as the 4th industrial revolution. It is the ability of Artificial Intelligence (AI) to automate much more decision making – with an ability to self-diagnose and correct problems and react to changing situations. It is the same concept really, as aircraft auto-pilots, and more recently, self-driving vehicles, but applied to industry and to their support functions.