|
By Kevin Hanson, Head of Business Continuity Management, Allied Irish Bank.
Background
AIB is the largest financial services company in Ireland. It has a market capitalisation of €18 billion, over 24,000 staff, and some 800 branch locations in Ireland, UK and Poland. AIB holds a minority position in M&T Bank based in Buffalo, USA. The bank also has a presence in key cities in Europe and United States. It is quoted on the Dublin, London and New York Stock Exchanges and its headquarters are in Dublin.
AIB has four data centres, two in Dublin and two in Poland (Poznan and Wroclow). This paper discusses the relocation of the two data centres in Dublin.
AIB’s first data centre was built in Donnybrook House in 1984 and the second was built in the bank’s headquarters, Bankcentre, in 1991. These two centres provide all IT services for our banks in Ireland, the UK and New York and some services to our bank in Poland.
The physical separation between the two buildings is one mile which in 1991 was the limit for synchronous data replication. In the ensuing years AIB upgraded these facilities with enhanced power, air conditioning and fire detection and suppression systems. However, in 2005 it was recognised that from a business continuity and security perspective the buildings were no longer fit for purpose. A major project would be required to relocate both data centres at considerable cost.
Business case
Before embarking on a project of this scale it was necessary to find a senior executive who would sponsor the project. In 2005, AIB appointed a new Director of Operations & Technology and he agreed to be our business sponsor.
Also, there was a mandatory requirement to develop a rigorous business case which would require main Board approval. Banks like projects that show strong financial returns and AIB uses net present value (NPV) and internal rate of return (IRR) criteria in evaluating and ranking internal projects. However, as we started developing the business case it became clear that there would be no financial return from this investment and furthermore, the costs over the ten year life of the project were going to be in the order of €130m, of which €70m would be incurred in 2006 and 2007. The manpower effort was estimated at 78 man years in 2006 and 2007.
Supporting strategy
Given that there would be no financial return we needed to develop our business case using non-financial criteria. As set out below our strategy was driven by several principles but the main driver was business continuity enhancement.
Sustaining business growth
AIB is growing rapidly and we simply needed more physical space to house new servers and other IT equipment. Also, from an operational risk perspective the two data centres were too close together and were adjacent to the Dodder River which has a history of flooding.
Value at risk
About 70 percent of AIB’s profits are earned in Ireland and UK. We estimated that a major outage would cost the bank €4m per day but this cost would increase exponentially to the point where after four days AIB would be out of business as a result of customer loss of confidence and regulatory intervention. The data centres host ATM and Point of Sale services with over 1,000 ATMs and 25,000 PoS terminals across the country. These are critical real-time systems. Also our Treasury, wholesale and retail payments engines are housed in these centres so very significant financial value is being processed in the data centres.
Business continuity enhancement
AIB’s business is critically dependent on IT services, volumes are such that manual processing is inadequate and with younger staff manual processes have been largely forgotten. Many of our services are now 24x7 hour and our customers rely heavily on them. Hence, our services must be resilient with very high levels of service quality and availability. Also, in line with regulatory guidance we must demonstrate recovery times of two hours or less in respect of many key services. Loss of data or transactions due to a service disruption is also unacceptable. These demands for high availability, short recovery time objectives (RTOs) and recovery point objectives (RPOs) require robust solutions. Very simply, the buildings housing these solutions must be both very resilient and secure.
We decided to use this project as an opportunity to pull distributed servers located in various sites in Ireland and UK into the data centres. In recent years, as the cost of network bandwidth has fallen, this strategy is easier to justify. It has the advantage that all data centre severs will be monitored by operations staff, regular maintenance and security patches will be applied and operating systems will be upgraded by technical staff. But most importantly, from a business continuity management perspective all data backups will be fully completed on schedule. We recognise that without the data and an ability to restore it, all business continuity plans and strategies are ineffective. Consolidation complements the business continuity agenda of resilience, minimising RTOs and validating RPOs.
These non financial criteria of service quality, business continuity enhancement and operational risk mitigation were accepted by senior management and the main Board and the project was approved in Dec 2005.
Project timeline
AIB has been enjoying high growth in its core markets partly as a result of the ‘Celtic Tiger’ and economic growth in its other markets. This has driven demand for more IT equipment in our data centres. Also the Bank has adopted an enterprise agenda with the objective of deploying the same IT solutions across all its geographic markets. This has exacerbated demand for more equipment in our data centres. To ensure we had sufficient floor space to accommodate these demands the relocation project would need to be completed quickly. We were set the task of moving one data centre in 2006, the second in 2007. This was a major challenge because we would need to:
* Find two suitable data centres in Dublin which would require completion of technical and commercial due diligence and successful contract negotiations. (We had already decided against building our own data centres).
* Equip and fit-out the data suites for power, air conditioning, cabling and cabinetry.
* Move IBM mainframes, tandem computers, numerous Unix clusters, 100 standalone Unix systems, some 400 Wintel servers, 90 network links and associated disc storage and tape silos in 2006.
* Deploy ‘seed kit’ to protect key services that could not tolerate any service disruption. (e.g. we purchased and deployed a new mainframe computer and refreshed our entire disc and tape systems in the new data centre).
* Ensure there was no loss of customer data during the transition.
* Deliver the project to specification, on schedule and within budget.
This constituted the largest IT project undertaken by the Bank and the largest of its kind ever in Ireland. It was going to require exceptional project management and technical skills.
Project management
In early 2006 we established a Steering Committee, a Project Board, a project management office (PMO) and engaged consultants to assist with the project. The Steering Committee provided overall project governance, it comprised senior management and met monthly. The Project Board comprised middle management and IT specialists and met weekly, its remit was to action any items identified by the PMO that were running behind schedule. We retained project management and technical consultants to supplement our own specialist teams.
Three major work streams were established:
- The Facilities team which was responsible for the selection of the new data centre and fit out of the data suites with power distribution units, air-handling units, etc.
- The Network team was mandated to deliver network connectivity to the new location, to plan the floor-space layout and to oversee installation of all cabling and cabinetry.
- The Planning work stream, which comprised the technical platform owners (e.g. mainframe, midrange, internet, etc) was to package the server estate in ‘move-groups’ of up to 30/40 servers which would be moved over a series of 12 weekends. This work stream had to understand the application and IT infrastructure interdependencies and optimise the packaging of the move-groups.
While the PMO was responsible for overall project management, it also had a communication role to ensure that all key stakeholders were briefed on progress of the project. It was critical that the business owners of the services were notified of key dates when services would be at high risk.
We agreed from the outset that this project would be led by AIB technical management and supported by external consultancy expertise. The consultants were chosen on the basis of proven track record. We knew from previous experience that unless we adopted this approach the project would not transition from project into ‘business as usual’ mode easily.
Project delivery
The Board mandate at end December 2005 was clear; relocate the Donnybrook data centre in 2006 without any service disruptions, do it within budget and ensure all services are available in the new data centre. Then, repeat the process for the second data centre move in 2007.
By end February 2006, our Facilities work stream had reviewed ten data centres in Dublin, short-listed three for due diligence and selected one in North West Dublin, about 15km from our second centre. (We were keen to maintain minimum separation between the data centres of at least 10km in line with FSA guidance). Fit out works commenced immediately and by end April the data suites were handed over to AIB for cabling and cabinetry installations.
In parallel, the Network work stream had to decide on whether or not to build the networks or seek a managed service. We opted for the latter and having selected a primary carrier we worked with them in linking the new and two existing data centres using dark fibre and high bandwidth communications links. These were tested and signed off by end May 2006. Cabling and cabinetry activities were in train simultaneously. This task was mammoth, involving fibre optic cabling runs measured in kilometres and copper cabling runs totalling hundreds of kilometres. Working seven days a week with our cabling contractors this work was completed by end July 2006.
New disc arrays and tape silos were deployed in the data centre and when network connectivity was established we started replicating our data from the primary centre to both our secondary data centre and the new site. Extensive testing was conducted to ensure the data was being replicated correctly without any corruption or errors. Validation of this exercise at the end of August was a major milestone.
The technical teams who were part of the Planning work streams were now ready to step up and manage the ‘lift and shift’ process of relocating large blocks of servers over the following 12 weekends. Each move-group had been meticulously planned during a ten week cycle and the activities for each weekend were documented on a minute-by-minute basis. Particular attention was required for the mainframe environment as this supported over 150 production applications. A control team monitored and reported progress against the planned timeline. As this was the really delicate part of the project we started cautiously by moving 20 servers, predominately business continuity rather than production servers, on the first weekend. It went without a hitch. It proved the planning methodology and gave us confidence to increase the number of servers in each move-group and to start relocating mission critical services. Every weekend for the following three months we followed our minute-by-minute project plans and all move-groups were relocated without incident. On several occasions we encountered an issue with 1-2 servers in a move group but the issues were quickly resolved and the servers were relocated the following weekend. The project was declared a major success by the end of November 2006.
Success factors and learning points
The success factors included, a very supportive project sponsor, dedicated and committed in-house technical teams who always went the ‘extra mile’, experienced consultants who provided guidance and project management skills, extensive planning and testing prior to moving production data and services and constant communication with all stakeholders. There were learning points, we should have allowed more time for cabling; we underestimated significantly the effort involved. Also we could have created larger move-groups with up to 80-100 servers per weekend. This would have created opportunities in the timeline for ‘rest weekends’ for technical staff. However, not to worry, we now have the opportunity to apply the lessons learned as we relocate the second data centre in 2007.
Author:
Kevin Hanson
Head of Business Continuity Management
Operations & Technology
AIB

•Date: 19th July 2007• Region: W.Europe/World •Type: Article •Topic: IT continuity
Rate this article or make a comment - click here |