|
David Lightfoot looks back over a long career in BC and DR and highlights the lessons learned so far...
30 years within IT approaching, a career that has taken me to several companies, in various countries and still service management principals are the biggest challenge; change management has not managed to stop a technical analyst going live with a solution because ‘he/she thought it was ok and only a minor change’.
Then there is the world of business continuity, something that I have seen from various operational viewpoints: from one of being responsible for building and testing plans at various stages to being the customer on the receiving end. Business continuity is an interest for me that has continued for many years, only exceeded by my love for Luton Town football club; who have had their own fare share of disasters in recent years!
I remember my first induction to what was then IT DR:
Some 20 years ago I was requested to audit the backup process for a critical PC, one of the first: and backup was via floppy disk. The agreed business process was to take two copies of the backup.
Why this system was critical was not really defined; it was done because the manager of that business area always liked things in triplicate and he had very little confidence in IT systems.
The person I met took me through the process they used and took great pride in matching this against the simple one page procedure, until the line ‘take second copy’. They then showed me a folder marked second backup. With alarm bells ringing I opened the folder, there was an index of over 30 pages and on each page was a dated photo-copy of the diskette! I never did quite work out how these could be read back into the system!!
So there I was: my first business continuity related problem. No second backup for an application, which had not gone through any risk assessment and within an environment where even the first backup was never tested.
Does the above still happen today? Yes, of course it does. Replace the floppy disk for a server DLT, replace the PC for a server and ensure that when the system which runs on the server goes live no risk assessment is carried out....perfect match.
In those days disaster recovery was really seen as something IT had to do to make sure that the IT system could run and be recovered in the event of a problem; the concept that a business area could have a problem which then subsequently required IT support was in its infancy.
In fact, going back 20 years, sometimes technical recovery solutions were discussed, documented and built within the auspices of IT, when really the solution was more business facing: i.e. use a pen and paper.
My role started through the operations or data processing route; I started as a data processing apprentice, working with an IBM 370/145 with Hardcopy console.
This initially involved being the brunt of practical jokes, being asked to find the ‘Number 7’ which had been removed from punch-cards as they were required for a recovery; a list which goes on.
The IBM 370/145 ran batch work each day, Monday to Friday, a total of less than 50 jobs, now we run 1,000s, with genuine business critical requirements.
In those days, backups were taken for Direct Access Storage Devices (DASD) or Disks, placed on a work ‘wheel-barrow’ and moved across factory yards to a fire-safe. This took place in all weathers come rain and shine, never to be used, or in some cases, seen again.
As IT grew and developed, more thought was given to disaster recovery and business continuity planning. I cannot remember the exact time but I remember moving into a period where the development of an IT application for a business meant the reduction in the number of employees. My father once managed a payroll team for a company with 10,000 employees, he had around 60 staff, and I worked for this company as well. When the IT system went in within a short time the workforce was reduced to 15 people. Not popular in the Lightfoot household and the cause of many discussions. However it was at this point that IT recovery and having procedures for the reduced team became key, as there was, as the MD said at the time, ‘little opportunity to turn back time’.
I think I first came across more robust thoughts on disaster recovery when I moved to a larger company around 1983. They were changing from a Honeywell to an IBM Environment and I remember being employed as a senior operator due to my IBM experience; luckily they never asked about the wheel barrow process at the interview!
I found that working within a larger organisation that had planned and budgeted for the transition from HP to IBM the disaster recovery capability was key and far more in line with overall business continuity plans. This was due to the fact that money had been requested, approved and allocated on the strength of the reliable service and improved performance which had been promised; so from playing with backup and recovery, backups were now critical and tested.
Within this environment the fundamentals of disaster recovery were tackled but not in a way we would do it now.
As the business had funded the transition because it wished a faster more reliable IT service, services which needed to be ‘faster’ were already known; therefore it followed on that if a service had to be faster, then it must be critical; therefore we had better make sure that the backups we took worked, otherwise IT would not get additional funding for major projects and all the fun for IT staff would end. Yet despite this all being known, nothing was written down about disaster recovery; it was a start but a start which was to protect IT as much as it was to help the business.
One of the major items to be tackled then (and today) within business continuity was getting management buy-in and the necessary budget. Many years ago it was the case of only being allowed to do the basics because those in directors roles did not wish to ask for additional funding because many businesses had little or no financial security in general. Although this has now improved, I feel we are still faced with a similar problem, in that that those who occupy director positions only tend to be in a role which is maintained by the same person for two-three years, so again ‘managing’ business continuity expectations can be seen as a better option than the correct level of investment. Risk assessment is still sometimes carried out on the principal of ‘self preservation’.
The one thing that is still a selling point is the disaster hitting home or seeing the disaster hit others; this always tends to be the biggest source of motivation to improvements within business continuity and, unfortunately, I feel this will be the case for many years. That is why the work within organisations such as the Business Continuity Institute are so key: as they market and promote from real life experience and encourage more and more people to get involved from different environments. Now in the world of business continuity there are less and less excuses for a company or organisation to say ‘we never thought this would happen’; basically because people in their everyday lives do not find this acceptable. So the risk assessment process of ‘self preservation’ has shifted because the answer is ‘take action’ not ‘keep quiet’.
So from talking about the early days of the photo-copy backup, what are the best and the worst things that I have seen?
Probably the worst was when I was consulting for an engineering firm and where we had the requirement to carry out a recovery of the HR payroll system, as part of an internal auditing control requested by a third party. Payroll was seen as critical within the company due to the large number of short term workers employed and the need, therefore, to keep very tight control of records. The plan was to recover our UNIX system to a third party location and then allow four users to test the system via a backup network link. This work involved agreeing the equipment specification and the third party contracts; and organising meetings of the key players. The timescale for the test was not really a driver as we had a few months to establish what we wanted. We saw this as a base for future work, so we wanted to start with good foundations.
So, cutting a long story even shorter, all was well, test day arrived: backups moved; people all in place.
Then the error hit home. We discovered far too late in the day that, while we had a good backup, which was taken each night, unfortunately two months before the test the Unix system back had requested a second tape to complete its work. An analyst in charge at the time realised that the support for this system was Monday to Friday 08:00-18:00 and that the tape could not be changed without the need for manual intervention; he therefore had taken the decision to reduce the amount of data backed up, to get back to the one tape principal. The justification after the event was ‘it was an easy change to do’ and ‘we never do recoveries’. We did manage to resolve this quickly but not before the IT and business director were aware and testing customers had had a wasted day. The re-test went ahead a few weeks later with different IT resource, if you get my meaning...
The best was many years later, in 2006:
Business continuity was well established, with a central team which included representatives from all business lines; and each business line had a defined BCP co-ordinator, including IT.
Monthly update meetings were in place for all parties; actions were assigned and minutes taken.
Risk assessment had been carried out against the only true driver for business continuity: ‘money’. Either as a direct loss or consequential loss.
Therefore there were two defined streams, those which the business dealt with as a business issue and those requiring IT support.
I worked within the IT arena and knowing what systems were critical and having agreement from the business eased our job tremendously.
It was deemed that recovery had to be within 24 hours for critical systems, so for us it was the case of reviewing our capability for the IT environments which had been defined as critical. We then had the opportunity to say what could be done with existing resources, or what investment was required to make things happen.
Being involved with the business meetings also helped the IT team to understand the business needs. It also enabled the team to understand the key business players and identify those who would not be happy during a test even if God himself patted IT on the back. We all know that these people are in every company, don’t we!
Anyway, due to the above, test dates were booked for a calendar year and my role was IT co-ordination; ensuring that IT resource was lined up, oh, and that bacon sandwiches were lined up for test days! I got more stress if this was omitted than anything else that the test could throw up.
We had four weeks to plan each test and then on an agreed Saturday upwards of 100 customers would come in, work from pre-allocated desks and carry out testing in accordance with their role and also in accordance with an agreed test script.
The level with which IT did recovery was excellent; down to icons on PCs, Internet favourites being in place, phone numbers switched and printer addressing remaining the same as at the work place; even the bacon sandwiches were hot!
So, where have we come in 30 years? For me, I have working in five different countries where I have met more people I have wished to stay in touch with than those I wish to avoid. Business continuity and disaster recovery wise, very much the same issues around funding and support remain; although now we have formal business continuity teams and companies are far more aware that something being not available is no longer a viable option. So although it was and is a long road, we are now able to see far more control around this whole area, good to hear and still good to be involved with.
Author: David Lightfoot, MBCS, MBCI, CITP
David has worked within several major companies in the UK and Europe: Whitbread, Clifford Chance, Bank of New York Mellon, Barclaycard and Lehman Brothers and more recently Nestle, based within Switzerland. He has spent the last 10 years managing and working within operational and process teams, concentrating on BCP, DR and ITIL practices.

•Date: 8th February 2008• Region:/World •Type: Article •Topic: BC general
Rate this article or make a comment - click here |