Monthly newsletter Weekly news roundup Breaking news notification    

Can your call centre handle a disaster?

Get free weekly news by e-mailBy Jeff Weil

Your initial reaction to the question in the title may be “yes, of course”. You know that a crisis of any scale – from a power outage to a natural disaster – has the potential of putting you out of business. In order to prepare for the worse you’ve got a recovery site ready and waiting, your systems are backed up regularly and you’re testing your failover site every now and then.

But is this enough when the survival of your business is at stake?

With more and more companies implementing relationship management strategies, call centres are increasingly viewed as a critical tool for binding customers to an organisation and, as a result, generating more revenues from them. Given this strategic importance, ensuring the continuity of call centre operations is more fundamental than ever.

Although many companies use standard network management tools to ensure the ‘up/down’ status of their infrastructure, for contact centre operations, this isn’t enough. To ensure you have a successful failover site, businesses must test and monitor the backup systems rigorously to ensure that customer calls are handled properly and that interoperability between systems, applications, and agents is running normally. This is especially critical as contact centres move to an IP-based infrastructure, allowing them to consolidating networks, virtualize agents and add new applications.

What business continuity and disaster recovery mean for the call centre

While the terms are often used interchangeably or together, for the purposes of this article we are defining business continuity as maintaining or re-establishing all critical business operations following a disruption. Disaster recovery is an element of business continuity focused on technology and infrastructure – often summarised as ‘getting things up and running’ after a disruptive event.

In the call centre environment, a business continuity plan will describe how to restore everything from the way a centre receives in-bound calls to where agents will sit when they answer these calls. Disaster recovery typically addresses the tangible technology such as servers, IVR (interactive voice response; speech recognition systems) and switches – on which the call centre depends.

This subtle distinction between business continuity and disaster recovery means that organisations must take a two-pronged approach when preparing for any type of business disruption. They must ensure that the hardware and network are operational and, more importantly, that the backup environment delivers the same experience as the production site, both for customers and agents.

Challenging common misconceptions

There are two common misconceptions about business continuity in the call centre.

The first misconception is that it’s enough to know that back-up systems are ‘up and running.’ Recovery systems can be up and running – but still not match the performance of the production site.

In order for customers to have a consistent experience between the two environments, all the transactions they rely on – and which are business-critical – have to operate as they should. For this to happen, all systems and applications at the recovery site must come up smoothly and perform together perfectly the minute you switch across.

Take a phone payment service that a mobile phone company or utility may offer its customers: what if the back-up IVR system and the customer database are both ‘up and running’ but not communicating with each other? The affected company could lose thousands of pounds a minute trying to diagnose and rectify the issue – on the hoof, in the midst of a crisis.

The second misconception is that it is not necessary to monitor backup systems as stringently as production systems. Companies generally invest a lot of time and money in monitoring their production systems – only to ignore the backup environment. As simple as it sounds, it is important to remind ourselves that backup systems become production systems during a disaster. Consequently, there should be no distinction between the two from a monitoring perspective.

Incorporating testing and monitoring into your continuity planning

There are five things companies should consider while creating business continuity plans for call centres:

• Create thorough documentation and a checklist to ensure that all processes, systems, and customer-facing interactions are covered within the plan.
• Build disaster recovery system testing and monitoring into the company-wide standard system deployment process so that testing and monitoring does not become just an afterthought.
• Test the continuity processes and disaster recovery rollover periodically throughout the year. Document the findings and save them with the continuity plans.
• Build a comprehensive monitoring plan for back-up systems to ensure that when they become the production lifeline and start generating revenue, they maintain the level of service customers have come to expect.
• Review the testing and monitoring status of the backup and recovery environment periodically to make sure everything – at primary and secondary sites – is up to date.

So what does this mean in detail?

Involving the experts

Business continuity and disaster recovery plans must take into account the who, what, why, and where of returning business processes to normal, and this process begins with defining exactly what ‘normal’ means.
In the case of call centres, this means identifying which components are critical: the infrastructure, servers, computer CTI (computer telephony integration), IVR or other applications? Quite frequently, the notion of operational readiness is viewed differently by different stakeholders. Therefore it also needs to be agreed if backup systems have to perform exactly like the production environment, or if their performance can scale down or vary slightly.

Whatever the consensus, it is not enough to know that all the physical infrastructure elements ¬ at the secondary site – such as servers, networks and T1s – show ‘green’ status. It is essential that the call centre applications themselves are fully operational, so that customers can interact with systems as before.

Here are two recent examples.

During the floods in the UK last year, Gloucestershire County Council had to evacuate its entire staff, including relocating its call centre to a failover site in Wiltshire. While it had made sure that citizens continued to get up to date information through the website, keeping its helplines open proved more of a challenge. The social care helpline was the greatest priority, but council staff found that they had no phone systems at first, so all phone lines and the 0800 emergency number had to be transferred. This led to the social care helpline only being operational again halfway through the morning.

The wildfires in California last year saw one company’s primary site destroyed by fire. Executives were convinced that the primary and secondary sites were load balanced. However, when they switched over to the recovery facility, they realized that the call centre applications at the secondary site had not been updated alongside the primary systems.

The issues emerging from both these crises could have been avoided through consistent disaster recovery testing and monitoring. In fact, in the Californian case, it turned out that the company’s recovery facilities had not been tested in over a year.

The cases also make clear why call centre stakeholders must be involved in the overall business continuity planning process: after all, they are the experts who understand what impact a disruption will have on the call centre and what trade-offs the company faces in an emergency situation.

Managing the trade-offs

Different organisations may be willing to accept different trade-offs. In many cases, primary sites benefit from state-of-the art technology while failover site rely on older equipment relocated from the production site after an upgrade. In case of an incident, customers have to navigate using less sophisticated automation, such as tone dialling. Similarly, calls may go to a pool of agents rather than being routed directly to an agent with the relevant expertise, as they normally would. As a result, agents may be completely inundated with calls, all while struggling with slower, harder-to-use systems.

Ongoing testing and monitoring will provide organisations with the tools to pinpoint these issues and their potential impact on the customer. They can then decide if, for example, extended phone queues and a longer wait for customers’ queries to be resolved, are acceptable under the circumstances or not.

Without adequate testing and monitoring processes, companies would not even be aware of the existence of these issues. And without the ability to manage trade-offs proactively, even the smallest disruption could leave them open to lengthy downtime, angry customers and substantial losses.

Author: Jeff Weil is vice president of services, Empirix. Jeff is responsible for customer support, professional services and managed services. Prior to joining Empirix, Jeff was vice president, Services, Support and IT for Frictionless Commerce where he built and directed all aspects of enterprise pre and post sales customer management as well as both internal IT and the company's profitable application hosting business for Fortune 500 customers. Jeff's previous experience includes 11 years at Progress Software, where he led the product management and product strategy organizations for the company's $300 million software product line. While at Progress, Jeff founded ASPEN, the business unit servicing the ASP market, and directed the America's Field Systems Engineering unit. He also held various development positions at NCR. Jeff holds a B.A. in Economics from St. Lawrence University.

http://www.empirix.com/products-services/contact_center.asp

What boxes should you tick when testing and monitoring failover systems?

• Verify telephony circuits are redirected to disaster recovery system
• Ensure that load balancing can redirect calls under load conditions
• Confirm that call treatment is identical for primary and disaster recovery sites during public branch exchange outage
• Check what happens to customers already in system during an IVR or CRM outage
• Make sure that default routing tables are accurate
• Identify error messages produced by systems during a call centre outage
• Emulate a full data centre outage to capture monitoring events for rapid troubleshooting.

Date: 20th June 2008• Region: UK/World •Type: Article •Topic: Telecoms continuity
Rate this article or make a comment - click here




Copyright 2008 Portal Publishing LtdPrivacy policyContact usSite mapNavigation help