Virtualization: risks and opportunities
Virtualization is now common place in many organizations, but the potential risks associated with it are not always considered. By Adrian Moir.
Server virtualization cuts the ties between the physical and the logical. It means one physical machine can pretend to be (or emulate) several virtual servers, each with its own users, applications and data, but it also means that each of those virtual servers is no longer tied to that hardware - it can be moved to another physical host and run there.
In some ways, virtualization is a throwback to the days when a whole bunch of applications would share a single minicomputer or mainframe. In between then and now, we removed complexity and interdependencies by giving each application its own cheap server, but that ramped up the overall power consumption and moved the complexity elsewhere in the infrastructure.
As the overall server population grew, and as floor space and energy consumption became major issues, it became clear that having a large number of servers, many of them running at just 10 or 20 percent of capacity, was no longer sustainable.
What virtualization does is to give us a way to bring back the efficiency of shared hardware - after all, you wouldn't give each application its own physical LAN - while keeping the technical and administrative advantages of having an individual ‘environment’ for each application.
However, while many people still focus on the server consolidation angle - and it is not unusual for organizations with dozens of servers to be able to consolidate them into a handful of large physical machines - it is much more than that. Innovative users have realised that virtualization can make vital contributions in many other ways; in particular they are using it to improve application availability and enhance their disaster recovery capabilities.
That's because the ability to run multiple virtual machines - or VMs - on each physical server means you can replicate an application server without having to double your hardware population. And because virtualization also makes the software independent of the hardware, the replica can run wherever you have spare capacity.
VMs are quicker and easier to provision than physical servers too, making it feasible to load-balance by adding new VMs to a pool. And they are simpler to backup and replicate, as each one is merely a file that can be saved whole or copied to a secondary server or data centre/center.
However, these benefits only become apparent if you do it right - if you do it wrong, server virtualization can instead make backups harder to manage and more expensive. Doing it right requires you to have thought through the data protection aspects and included them when planning your virtualization strategy. However, early adopters often assumed they could simply transplant their existing backup processes from the physical into the virtual, and this is rarely the case.
In addition, while virtualized servers can be much more productive than non-virtualized servers, running dozens of VMs on a single piece of hardware puts a lot of eggs in one basket. It is vital therefore to protect the application data within those VMs, and ensure that if anything happens to them, the system can recover and continue operating.
There are also risks in virtualization being so easy to do. Physical servers are relatively easy to audit, but virtualization allows departments or even individual users to set up VMs that the IT department knows nothing about and cannot see. As a result, we are starting to see islands of virtualization, with virtual server sprawl replacing physical server sprawl, so you need tools to automatically discover VMs.
Virtualization key concepts
A virtualized server has a compact and highly efficient layer of supervisory software, called a hypervisor, which sits between the hardware and operating systems such as Microsoft Windows or Linux. The hypervisor - VMware's ESX and now vSphere are by far the most widely used, but others include Microsoft Hyper-V, Parallels Virtuozzo, Virtual Iron, and Xen - pools the servers' physical resources and allocates them to the various guest operating systems and applications.
The hypervisor presents the guest operating systems and applications with all the software hooks and interfaces that the hardware would normally present. It also arbitrates between the guests, allowing them to share the host machine's resources - so for example it also emulates an Ethernet switch, enabling the guests to share the host's network connection.
Each guest operating system - or VM - therefore "sees" a physical server that belongs to it and its applications, but in reality it is running in a virtual server that interfaces to and shares the physical server's resources via the hypervisor. This approach brings four key advantages:
• Isolation - each VM is isolated from the host operating system and from other VMs on same machine. That means that if one application or VM crashes, it should not affect the others that it shares with. In addition, data cannot pass from one VM to another, except through defined network connections.
• Hardware independence - you can run the same VM on any other VMware system without modification and without needing to reinstall the application or operating system. That's because the hypervisor acts as a kind of software shim, hiding the real hardware underneath and breaking the traditional operating system-to-hardware ties.
• Encapsulation - each VM is saved as a file. Backing up, moving or replicating a VM can be as simple as copying or moving that file, using tools such as VMware VMotion (which can move a VM from one server to another while it remains active) or data protection software from VMware partners.
• Partitioning - not only can you run multiple VMs on one physical machine, but a single machine can support multiple different applications and operating systems. Physical resources are pooled and can be allocated to the VMs in a controlled way.
The dynamic data centre
The logic of server virtualization is inescapable - it can make your servers hugely more efficient, but it can also do a whole lot more. It can greatly enhance your business continuity and disaster recovery capabilities, for example, and it could even help you build an adaptive data centre which automatically reconfigures itself to adapt to changing workloads.
As well as straightforward consolidation, organizations are using virtualization to improve application availability. That's because a VM could also be moved to a more powerful server, backed up whole, or mirrored to a second site.
Provisioning new servers becomes faster and easier too, so instead of having to justify the purchase of hardware when you need a new application server, then wait for it to be delivered and set up, you can use tools such as VMware Lifecycle Manager to automate the provisioning of a suitable VM that will use spare capacity on an existing server.
You can even use cloning tools to create a template for new VMs, so copies of the original can be created as needed, with the operating system already installed. This technique can also be used to clone application servers, for example to add capacity to an application cluster.
Automate all these processes, and you have the makings of an adaptive and self-healing utility data centre, able to reconfigure itself to meet the load upon it and minimise the impact of a hardware or software failure by automatically moving and reassigning resources. For example, processor resources could be taken from one application and used to start extra virtual servers for a higher-priority application.
One of the risks is that an organization may get into virtualization because it sees the potential savings on hardware and power, and only then discover all the other possibilities. That means it is important to learn about the subject and get independent advice before starting the project, for example from a reseller or system integrator who's built these kinds of systems before. Make sure too that the vendors involved have already done the necessary interoperability testing - and not just with VMware, but with the other hardware and software vendors involved as well.
The infrastructure changes that come with virtualization and the dynamic data centre are also bringing back the need for strong capacity planning skills. Capacity planning is vital because as you consolidate servers and their workloads, you also change the loadings and usage patterns for both your storage and your network.
Changing usage patterns are not necessarily a bad thing - they could allow you to consolidate other parts of your IT infrastructure for even greater efficiencies, for instance. However, they could also come as a nasty surprise to an organization that's already spent a lot of money on hardware and software, and now finds that it got its initial plans wrong.
Protecting virtualized environments
Part of building data protection into a virtualization strategy is to expect the hardware loading to be dramatically different. Virtualized servers can be much more productive than physical servers, running up to thirty VMs or more per machine, but that in turn means 30 servers sharing a single network connection and a single boot disk.
It also means less spare processor capacity for occasional tasks - the whole point is to get up to 70 percent or 80 percent server utilization, but that might not leave you enough room for the extra load of running backups.
Yet because virtualized servers are so much more productive than physical servers, it is even more important to protect the data produced by the applications running in them. It is also vital to ensure that if anything occurs to upset the availability of these applications, the system can recover and continue operating.
One way to deliver data protection in a virtual environment is to move the backup and recovery load off the application server and into the storage tier. You can then use tools provided with the hypervisor to snapshot the virtual machine in the background and move the copy via a dedicated backup server for migration to tier-2 disk or tape.
Most of the major backup software suppliers and enterprise storage vendors have invested time and money in understanding server virtualization, and as a result many now have tools to protect virtual servers.
VMware itself offers tools to address some of the issues - examples are VMotion for dealing with planned outages, Dynamic Resource Scheduler (DRS) for load balancing across a virtualized infrastructure, Site Recovery Manager (SRM) for disaster recovery planning and execution, and VM mirroring for fault-tolerant HA.
More is needed though, and because virtualized infrastructures work differently from physical ones, they also need different data protection techniques. For instance, traditional backup methods expect a one-to-one relationship between server hardware and software, and so do not always seamlessly adapt when each system has 30 virtual servers. That could be a lot of backup agents to buy, install and manage.
Then there is the question of whether your corporate standard for backup software will actually work with VMs, and if it will, are you licensed to use it that way? Backup vendors can licence quite differently: some may need a licence per machine, virtual or physical, say, and then there may be extra modules which can be per-VM or per-backup server.
There are alternative routes, there are technologies available that sit outside the VM server and are clientless and agentless. If they can then be effectively licensed per Virtual Centre Server or per host machine, not per VM, and integrate with the VM environment seamlessly it enhances the interoperability of what is often still a mixed physical and virtual data centre and often provide self discovery and VM movement awareness.
Also, there will still be times when you do need an agent or client on a VM, such as when you want to backup a specific application - the application doesn't know if it's running on physical or virtual, so it behaves the same and data protection tools need to offer application aware features. Even this may be avoidable though, if you use a snapshot tool such as Microsoft VSS (volume shadow copy service) to create a stable proxy environment that you can do granular backups from. You will still need to quiesce the VM and application to snapshot a valid image, of course, and this whole operation should be synchronised by the data protection tools in order to guarantee consistency.
The degree of integration with the virtual infrastructure can vary too, and of course when the hypervisor is updated, it can take time for third-party tools to catch up. So tools that integrate with the hypervisors API’s and CLI’s are more likely to remain consistent and up-to-date as the platforms evolve.
Dealing with virtual server sprawl
Traditional data protection methods assume that you know the servers are there and need protecting - but that won't necessarily be the case with VMs. Virtualization makes it so easy to create new systems that the result is sometimes called virtual server sprawl, with the result that organizations find themselves with critical data residing on VMs that are unprotected because the IT and storage admins don't even know they are there.
To make matters worse, in the past you had to know the virtual infrastructure and manage its backup semi-manually, modifying scripts and so on when new VMs were deployed. Fortunately, using such tools, you can now audit VMs to see what OS they are, where they're running, etc. And this automated discovery of the environment means the backup administrator no longer needs to know about the physical layer and resources, as that is all handled in software.
Virtual server sprawl can also result in organizations creating an unmanageable number of VMs. Fortunately, encapsulation provides an effective alternative - simply image the whole VM. What makes this possible is the level of abstraction provided by the hypervisor. That's the software shim, such as VMware ESX / vSphere or Microsoft Hyper-V, that presents the VM with the illusion of its own physical server.
This abstraction means that a VM is really just a single file or directory on its host machine, which means you can quickly copy it to another server acting as a hot spare. A notable advantage over traditional high availability mirroring is that the replica doesn't have to run on the same hardware, making mirroring cheaper to achieve.
However, encapsulation doesn't answer all disaster recovery needs, and while VM images are very useful, they are not adequate for granular recovery of business critical applications. For example, you might only want one mailbox from the mail server, but with an image backup you would need to mount it or at least spin it on disk in order to access the single item. This incurs time and resource expenditure that is not necessary. Organizations should therefore consider data protection solutions that incorporate granular recovery capabilities, and this does not always mean looking at traditional backup and recovery solutions. Innovations in disk-to-disk solutions that use continuous data protection (CDP) techniques are emerging, and these allow organizations to protect virtual environments with minimal impact, provide granular and high speed recovery, and tie in application aware features.
There's never only one correct way to protect data though. It is a question of using the right technique at the right time, and you will probably need a mix of different techniques. There are tools now that let you pull individual files out of a snapshot, for instance. But if you don't want to have to use multiple backup products to achieve that, you will need one product that does several of them - image backup, granular backup and disaster recovery, say, all in one.
Once again, VM imaging changes your usage patterns though, because server virtualization tends to result in a big jump in the server 'population' which in turns means there will be a lot more images to manage and store. Deduplication technology can help here, with users reporting compression ratios of 20:1 or higher, because a large amount of VM content - operating system and application files, for example - is common to every image.
It is also important to store the data separately from the application server, not least because it offers much more opportunity for deduplication. In some cases, such as webserver clusters, it also means application servers can simply be cloned or deleted as needed. However, this will also add to the load on the data centre's storage back-end which may mean you need to invest in Fibre Channel or iSCSI for shared storage networking.
And finally, you should not consider the virtualized data centre as an island. For many organizations, the migration to virtualization is a gradual process, and so there is a time when the physical and virtual need to co-exist. Data protection solutions should provide support for both environments, allowing shared resources such as tier-2 disk and tape, libraries and backup servers. This provides both a cost effective data protection solution, and the security that the whole computing environment is protected at all phases of the migration.
Business continuity and server migration
Virtualization is proving particularly powerful when it comes to business continuity and disaster recovery; however, this is a complex area where partnerships between vendors are vital.
Business continuity is all about maintaining application availability and delivering an uninterrupted service to your users. Part of that is disaster recovery, where you might need to rebuild a failed, damaged or stolen server, or perhaps switch your entire operation to a secondary data centre. There are two distinct things to deal with here though: planned downtime, which is by far the largest cause of service interruption, and unplanned downtime.
Planned downtime is typically for maintenance - to patch or update a server's software, say, or replace or upgrade its hardware. With virtualization, many of these tasks no longer even need downtime, as tools such as VMware's VMotion can move the affected VMs to other servers while they carry on running, for instance.
And where downtime is needed - to apply a software patch, for example - you can use VMware's snapshot capability to take a copy of the VM before applying the patch, then quickly roll-back to the original version of the VM should the patch cause problems.
Increasingly, backup vendors are getting involved here as well. Their ability to image entire servers and do bare-metal restores, copying the backup to different hardware, is now being adapted to restore or migrate to a VM instead, or from one hypervisor to another. That opens the way for organizations to use different hypervisors for different purposes, say VMware for production and Hyper-V or Xen for secondary servers or in development.
There is synergy here with server replication and fail-over tools, as well. If you can restore to any server, be it physical or virtual, a physical server can be replicated to a VM, making fail-over cheaper to do and opening these more sophisticated data protection schemes up to smaller organizations.
As for unplanned downtime, the ability to backup a VM whole, mirror it to a second data centre, or simply restart it on a more powerful server - or a server normally used for other tasks, such as software development and testing - is hugely significant when you need to recover from a system failure or fix an application performance problem.
While it is true that server virtualization can make the system administrator's life simpler and more cost effective, it will also add complexity in a number of notable areas. Even with VMs, you still need to manage and protect each virtual server, and as it scales up - or sprawls - the virtualized data centre of the future could have hundreds of VMs.
On top of that, the VMs will not only be in the data centre. Departments and end users will set up their own, realising that it allows them to acquire another server without having to go through a slow and laborious procurement process. Add the emerging option of cloud-based VMs, and you could have thousands to manage and protect.
Tracking and discovery will therefore become major challenges, so you should check that your software supplier can provide discovery tools that will make sure you spot new VMs coming online.
Then there is management and protection. Managing 40 VMs is one thing, but managing 400 or 4000 is a rather different proposition. Data protection will need to be planned differently too - all the VMs on a physical server must share its physical I/O capacity, which creates a bottleneck, as does routing all the VM backups through a shared VCB (VMware Consolidated Backup) server or a VADP (VMware API for Data Protection) capable server.
Newer technologies will help here. For instance, consider the likes of CDP/RDP (continuous/real-time data protection) and replication. They constantly update a second backup copy of the VM or application data, so they move less data, more often. If done right, CDP/RDP should always give you a valid recovery point that you can guarantee is consistent. Add to that the potential to replicate the protected data to an alternate data centre you then have multiple copies of the data sets and the ability to provide fast consistent recovery of data with a very good RPO (Recovery point objective) at either location. Taking this approach can greatly enhance such technologies like VMware’s SRM (Site Recovery Manager) by allowing ‘out of band’ replication of data with the ability to roll back in time to recover a consistent data set.
Server virtualization will also affect your storage infrastructure. If you want the ability to migrate VMs freely, you will probably want storage virtualization as well as server virtualization. And having more VMs to replicate and backup will require more storage space and bandwidth.
To help here you should look to technologies such as thin provisioning. This saves storage space by only allocating physical capacity as it is actually used, but there are also standalone tools which can reclaim space by automatically inspecting and resizing a VM's file system.
Deduplication is another technology that can help. It understands that it is protecting multiple VMs and avoids storing duplicated data - similar VMs will contain the same operating system files, for example - by looking for repeated patterns in the files and only storing the first such pattern. This reduces the amount of data that must be backed up, with users reporting compression ratios as high as 50:1 in some cases.
Deduplication can also reduce the volume of data that must be transferred over the WAN to a secondary data centre for disaster recovery, though for this you need to ensure that the deduplication is done before it is replicated, and not - as some schemes do it - once the data is at rest at the secondary disaster recovery location.
Lastly, you have to bear security in mind. You must ensure that VMs cannot see each other's data unless they are supposed to do so
Author: Adrian Moir, technical director EMEA, at BakBone Software www.bakbone.com