Citrix
Business continuity adverts
Monthly newsletter Weekly news roundup Breaking news notification    

Network emulation for business continuity testing

Get free weekly news by e-mailBy Sameer Gupta, VP and general manager of professional services, Anue Systems, Inc.

Introduction
This article addresses a critical element of the business continuity lifecycle; testing and rehearsing the plan. When it comes to technological aspects of business continuity, IT departments are faced with the need to perform rigorous and realistic testing of business continuity procedures and capabilities for network and application systems.

A network business continuity test plan includes testing failover and recovery strategies on the production network with production systems, personnel and procedures. However, this step is a small component of the overall business continuity test plan. The bulk of system testing is done in the lab to avoid needless disruption of the corporate network and the attendant loss of productivity. This article identifies the steps for network business continuity system testing in the lab environment; such testing authentically replicates the load, delays, bandwidth contention, errors and other impairments found in the production and backup networks, providing an accurate prediction for how the plan will work under actual disaster conditions.

BCM system testing: business requirements
When IT departments begin examining network business continuity plans, the discussion can get sidetracked on how to overcome technical issues and limitations. It is important to realize that business requirements drive technical requirements. As the NISCC Good Practice Guide on Telecommunications Resilience points out, a business impact analysis (BIA) focuses on services, not technology.

A BIA evaluates the impact of the failure of each process on all aspects of the organisation. Financial issues and revenue generation are important to consider, but the assessment goes beyond finances to evaluate impacts on other areas such as public image, shareholder confidence, customer service, vendor relations, employee morale, consumer confidence and the ability to comply with financial reporting requirements. The BIA identifies recovery time objectives (RTO) and recovery point objectives (RPO) based on the significance of each process to the strategic and tactical goals of the organisation.

For example, RTOs might vary based on monthly, annual or seasonal business cycles. An automated process might have a manual fallback system as an alternative for short-term recovery. Government regulations may mandate RTOs for some systems or processes. The network business continuity test plan emphasizes the purpose for the process by correlating test results back to the business requirements.

Business continuity system testing: technical requirements
Once business requirements are established, the processes and procedures required to achieve the business requirements are identified. Technical requirements translate a recovery time objective into a service-level objective (SLO). SLOs can be expressed in terms of a target transaction response time (TRT) or number of transactions per second (TPS) for distributed applications, voice quality scores for VoIP systems, or data throughput rates for data transfer/backup or mirroring applications.

Low risk profile environments may allow less rigorous SLOs for the backup network. High risk profile environments may dictate seamless performance between the primary and backup systems.

Simulating network load, distance and impairments
The ultimate test of a network business continuity plan is switching the production network over to the backup network. However, due to the cost of such a disruption and the potential loss of productivity and revenue, once a network business continuity plan is validated and in place, live switchover tests are typically done only once a year. Preparation for a live test is done in the test lab, where a production environment is reproduced through the five steps of network business continuity testing:

1. Create traffic profiles
2. Establish failure thresholds for applications
3. Create network profiles
4. Establish performance and quality metrics for the primary system
5. Establish performance and quality metrics for the backup system.

Prerequisites
To evaluate a business continuity plan, you must have a test bed that enables you to run controlled, repeatable tests. Consistency and repeatability are the foundation for meaningful testing and evaluation.

Components of an evaluation test bed include:
- System under test: the combination of business applications running on the production network. The servers and applications are not a subset of the production network; they mirror the production network.

- Traffic generator/analyzer: A test generator/analyzer capable of generating a customizable mix of traffic type (voice, video, data), emulating user/traffic behaviour in actual networks (including normal business operation patterns and congested/overload patterns), and reporting results, such TRT, TPS, quality scores and data throughput.

- Network emulation: A network emulator that can be configured to create the delay, packet loss, reorder and errors that occur on the production network and the backup network. To establish the threshold conditions in step 2 (see above), the emulator must ramp impairments while the test is running without disrupting traffic. To create realistic conditions based on the profiles in step 3, the emulator must support standards-based emulation and capture/replay of network conditions.

Step 1: create traffic profiles
To understand the performance of your network and applications, you must first understand the nature of the user base. As part of the second step in the business continuity lifecycle, the application QA team studies the organisation to assess periodic and event-driven usage and behaviour, such as time-of-day variations, end-of-month/quarter/year demands and factors that may be unique to the organisation or industry. The results form the basis of a set of traffic-load and user-behaviour profiles that reflect the range of expected usage patterns. These profiles include: the log-in storms that occur in the morning and after lunch; typical usage with full attendance; and peak usage during crunch times, such as end-of-period reconciliation and reporting. The profiles are used to configure the traffic generator/analyzer, either dynamically or via scripts.

Step 2: establish impairment failure thresholds for applications
An application can fail for many reasons. In this step of testing, the limit of the ability of the application to process traffic in the presence of network impairments is identified. Traffic is generated using the load/behaviour profiles established in step 1. Network delay and impairment settings are ramped up in steps, and failure points (the level at which the performance metrics violate SLO) are identified. Network impairment threshold testing helps identify the service level agreement terms required for both primary and backup networks to assure acceptable application performance. All network profiles created in step 3, including backup network profiles, must fall within the bounds of the failure thresholds established in step 2.

Step 3: create network profiles
The network team provides a set of network profiles that reflect the characteristics of each type of network connection. Delay and impairments, such as packet jitter, loss, re-order, modification and bit errors, will occur to varying degrees, depending on the core network and access method. An on-site profile will have LAN speeds and minimal impairment. A remote office profile will have WAN speeds and more impairment. A home office profile might have broadband access speeds and delays for VPN security. A mobile user may have a variety of connection options, including dialup, either directly into the company network or to a provider and then via VPN, or broadband with VPN.

If the organisation has an onsite or local data centre and a remote backup data centre, the delay and impairments in the core network will likely be between the primary and backup networks. If the organisation uses an off-site data centre that offers geographically diverse redundancy services, the variation between delay and impairments between the primary and backup networks might not be as significant.

Step 4: establish performance and quality metrics for the primary system
Each load/behaviour profile is used to generate application traffic across primary system network profiles. The analyzer gathers statistics from each test run. These metrics establish the baseline for each profile, which is compared to SLO. The results could indicate a need to optimize an application on the production network before proceeding with more testing. An application that fails or barely passes SLO in an optimum network will not have acceptable performance in a disaster situation.

Step 5: establish performance and quality metrics for the backup system
Each test run in step 4 is run again, against each of the backup network profiles. The metrics reported from each test are compared to the SLO to verify that the system can still deliver acceptable performance and quality under the expected backup network conditions. Violations of SLO are subjected to troubleshooting to determine the root cause and remedial measures.

Achieving realism in lab testing
Business continuity planning is about reducing risk. To truly reduce risk, the network business continuity test plan must authentically replicate the application traffic, user behaviour, network delays, bandwidth contention, errors and other impairments found in the production and backup networks.

Real application traffic is generated by thousands of users who have multiple characteristic behaviour profiles. Real application traffic has characteristics that vary by the type of application and transaction. Real production and backup network conditions vary significantly from the conditions on a test LAN. Real-world testing uses knowledge of how real users, real applications and real networks behave to re-create a realistic environment in which to test applications in five steps.

Realistic configuration of network emulation profiles can be accomplished in two ways. Capture/playback involves capturing the end-to-end packet loss, delay and jitter for each profile and then playing back the conditions dynamically during testing so that the emulator replicates actual conditions on a packet-per-packet basis. Standards-based emulation uses statistical models found in documents such as ITU-T G.1050 or TIA-921 to create realistic network conditions.

Re-testing, repeatability and compliance
When the network or the systems running over it change, the business continuity plan must be modified to accommodate the change. And of course once the plan changes, additional testing is required to validate that the new plan can achieve the target RTOs and SLOs. During re-testing, the power of real-world testing becomes apparent. The effort to create profiles for initial testing can now be leveraged to quickly validate the new system. And, because of the precision of the real-world test system as it emulates application traffic, user behaviour and network conditions, each test can be executed with the same user and network profiles to provide an apples-to-apples comparison between the old system and the new system in terms of performance and quality metrics. In addition, test automation can make re-testing a simple, reliable, unattended process.

As a result, real-world testing simplifies the ongoing process of maintaining regulatory compliance in the face of a dynamically changing organisational environment.

Conclusion
Recent technological advances have made it possible to authentically replicate a production network environment in a test lab. Real-world testing uses precision network emulators and a traffic generator/analyzer create realistic and rigorous conditions for testing, tuning and verifying the performance of business continuity systems and subsystems prior to conducting the full scale test on the production network. In addition to sparing the expense and disruption of testing on the corporate network, real-world network testing in the lab produces the reliable and repeatable results required for compliance with industry-specific business continuity requirements.

Author
Sameer Gupta is VP and general manager of professional services and a co-founder of Anue Systems. Anue Systems is a provider of application testing and validation solutions. With products and services that emulate real-world network behaviour and assess its impact on mission critical applications, Anue helps organisations implement business continuity solutions with confidence. Prior to Anue Systems, Sameer was a member of technical staff in SONET IC development at Agere Systems, a senior consultant with Synopsys Professional Services and an ASIC designer for a number of companies including Atmel, Silicon Graphics, AMD and HP Convex.

Sameer holds a Bachelor's degree in Computer Engineering from Old Dominion University and a Master's in Electrical Engineering from Purdue University.
Anue Systems will be presenting at Automata’s ‘BCM Challenges for 2008’ conference on 31 January and 1 February at Old Windsor near, Heathrow
http://www.automataservices.com/conference.htm

Date: 24th January 2008• Region:UK/World •Type: Article •Topic: IT continuity
Rate this article or make a comment - click here





Copyright 2010 Portal Publishing LtdPrivacy policyContact usSite mapNavigation help