WELCOME TO THE CONTINUITY CENTRAL ARCHIVE SITE

Please note that this is a page from a previous version of Continuity Central and is no longer being updated.

To see the latest business continuity news, jobs and information click here.

Business continuity information

Blackberry downtime caused by business continuity failure

Downtime problems for Blackberry users in various regions of the world have entered a third day; and information is emerging about the cause of the crisis.

According to RIM the downtime was the result of the failure of a core network switch and then the failure of business continuity processes which were meant to kick-in.

RIM explained the situation in a service message posted on Facebook:

“The messaging and browsing delays being experienced by BlackBerry users in Europe, the Middle East, Africa, India, Brazil, Chile and Argentina were caused by a core switch failure within RIM’s infrastructure. Although the system is designed to failover to a back-up switch, the failover did not function as previously tested. As a result, a large backlog of data was generated, and we are now working to clear that backlog and restore normal service as quickly as possible. We apologize for any inconvenience, and we will continue to keep you informed.”

RIM’s problems raise some important issues for all business continuity managers:

  • Successful tests do not guarantee that business continuity strategies will work.
  • Holistic business continuity plans need to consider the failure of failover systems and require that strategies are in place to deal with such a situation.
  • High availability systems are not a substitute for conventional business continuity and disaster recovery solutions. The latter provide the belts and braces required for total system assurance.

Update: October 13th

Better late than never: RIM’s CIO, Robin Bienfait, issued a statement on the Blackberry situation yesterday. Business continuity managers may be interested in assessing its effectiveness:

Service update from RIM CIO

To All BlackBerry Customers:

I want to first apologize for the service interruptions and delays many of you have been experiencing this week. I also wanted to connect with you directly, give you an update on the service issues we are trying to solve, and answer some of the questions and concerns you’ve expressed.

You’ve depended on us for reliable, real-time communications, and right now we’re letting you down. We are taking this very seriously and have people around the world working around the clock to address this situation. We believe we understand why this happened and we are working to restore normal service levels in all markets as quickly as we can.
Here is the current status of service and issues for the various regions that were impacted:

For Europe, Middle East, India and Africa (EMEIA):
• Email systems are operating and we are continuing to clear any backlogged messages. Support teams are working to minimize the impact on our customers.
• BBM traffic is online and traffic is passing successfully
• Browsing is temporarily unavailable as the Support teams monitor service stability and continue to assess when this service can be safely brought online
• Support teams have added capacity to help with message delivery between regions and continents

For Canada and Latin America:
• Email systems are operating and we are continuing to clear any backlogged messages. Support teams are working to minimize the impact on our customers
• BBM and browsing services are online and traffic is passing successfully (except for three carrier networks in Latin America that are serviced by the EMEIA infrastructure – browsing is temporarily unavailable for those three carrier networks)
• Support teams are investigating reports of BBM delays

For the U.S.:
• Email systems are operating and we are continuing to clear any backlogged messages. Support teams are working to minimize the impact on our customers.
• Support teams have added capacity to help with message delivery between regions and continents
• BBM and browsing services are online and traffic is passing successfully
• Support teams are investigating reports of BBM delays

We will provide regular updates on BlackBerry.com, RIM.com and via our social channels. We are doing everything in our power to restore regular service everywhere and to restore your trust in us.

Yours sincerely,
Robin Bienfait
Chief Information Officer, RIM

Comments

With any critical IT asset, recovery following a disaster hinges on how often the system's resilience and recoverability have been tested and how well these processes have been documented. It's also important to have at least two people trained in recovering each critical system, since if the primary person is suddenly unavailable, the critical system may not be successfully recovered. Develop detailed scripts for each system recovery, review them with the system vendor (especially the person who developed the system, if that person is available), and validate the scripts with both table-top and system-level exercises. Finally, be sure to identify the most critical applications, servers, network assets and other infrastructure elements. Schedule exercises more frequently than for less critical systems. The cost for this effort may be high, but think of the cost to the organization if one or more critical systems failed and could not be recovered. Research in Motion has provided all of us with a timely object lesson on the value of exercising and carefully documenting disaster recovery plans.

Paul Kirvan, CISA, FBCI

If an organization like RIM can experience downtime through server failure, then other organizations should be seriously looking at measures to safeguard their IT infrastructure. This only highlights that businesses of all sizes need to have measures in place for resilience and to protect against failure through secondary or even tertiary business continuity solutions.

Neil Stephenson, CEO, Onyx Group

Make a comment.

•Date: 12th October 2011 • Region: World •Type: Article • Topic: ITC continuity
UPDATED 18TH OCTOBER 2011

Business Continuity Newsletter Sign up for Continuity Briefing, our weekly roundup of business continuity news. For news as it happens, subscribe to Continuity Central on Twitter.
   

How to advertise How to advertise on Continuity Central.

To submit news stories to Continuity Central, e-mail the editor.

Want an RSS newsfeed for your website? Click here