IT disaster recovery failures: why aren’t we learning from them?
- Published: Friday, 31 March 2017 07:53
Gil Levonai looks at three areas where many organizations can improve their IT resilience: focussing on why effective recovery requires consistent testing; why backup is not disaster recovery; and how hybrid cloud can provide a safety net.
The news of an IT outage impacting a large company seems to appear in the headlines more and more frequently these days and often the root cause seems to be out-of-date approaches and strategies in place for IT disaster recovery and compliance. Common mistakes that businesses make include not testing the recovery process on a recurring basis; and relying on data backups instead of continuous replication. Also, businesses are still putting all their data protection eggs in one basket: it is always better to keep your data safe in multiple locations.
C-level leaders are now realising the need for IT resilience, whether they’re creating a disaster recovery strategy for the first time, or updating an existing one. IT resilience enables businesses to power forwards through any IT disaster, whether it be from human error, natural disasters, or criminal activities such as ransomware attacks. However, many organizations are over-confident in what they believe to be IT resilience; in reality they have not invested enough in disaster recovery planning and preparation. The resulting high-profile IT failures can be used as a lesson for business leaders to ensure their disaster recovery plan is tough, effective, and allows true recovery to take place.
If it ain’t broke… test it anyway
Virtualization and cloud-based advancements have actually made disaster recovery quite simple and more affordable. But it doesn’t stop there: organizations need to commit to testing disaster recovery plans consistently, or else the entire strategy is useless.
This is why the FBI issued guidance in ‘Ransomware Prevention and Response for CISOs’ that urged organizations to “verify the integrity of those backups and test and test the restoration process to ensure it is working.” The strategy must include being able to quickly and as completely as possible recover critical data using proper tools and processes. Before performing a live failover on a production environment, IT admins should run a test failover to ensure user access is set-up and configured ahead of time to test access and look for possible issues before bringing down the production environment. It may also be useful to perform a live failover on test servers or environments to get a good handle on the process.
Essentially, the disaster recovery site at this point is a separate copy of your live production environment in a ‘sandbox’ test network to prevent any communication to the public network or your production environment. Such non-disruptive disaster recovery testing allows for a full ‘dry-run’ of DR preparedness.
Traditional backup is fine, but enterprises don’t want to restore operations to how they were yesterday. It’s not good enough and results in significant revenue loss. Additionally, it is critical to implement and successfully test a rigorous business continuity and disaster recovery strategy that does not rely on the tribal knowledge of individuals required for recovery and can support multiple virtualization, hardware and cloud platforms for flexibility. The C-suite needs to incorporate automated failover and recovery technology with minimal data loss for true IT resilience.
Backup and disaster recovery aren’t the same
Some companies believe that the easiest solution to protect data in a virtual environment is to backup the virtual machines using tools like snapshots or agents. However, this can slow down your production environment and is difficult to scale. The most effective approach to a business continuity/disaster recovery solution is continuous, hypervisor-based replication. Enterprises will then be able to get long-term data retention and archiving out of their disaster recovery solutions, which may render some backup solutions obsolete. Many disaster recovery solutions, for example, have backup-like features, including recovering a single file from a point-in-time seconds (not hours!) ago, which is more granular than traditional backup. If you can recover data from seconds before an accidental data deletion, for up to 30 days, why would you defer to a 12-hour old backup? Or in many cases an even older one?
Hybrid cloud is your safety net
CIOs should consider a hybrid-cloud strategy that gives businesses another firebreak and secondary environment. Instead of storing all their data on-premises or with a cloud provider only, more and more companies are realising that adopting a hybrid or multi-cloud approach for something like disaster recovery, with the right partners in place, can actually be quite simple and affordable while also serving as a great entry point to the cloud. The perceived complication and expense of transitioning to cloud, which previously held many IT organizations back, is now going away.
IT teams working in the cloud find themselves anticipating issues and moving their data and applications before the damage hits. This sort of proactive movement of data is impossible with a traditional data centre / center, of course, but for those organizations embracing virtual, cloud-ready IT environment, it is a reality. In case of a hack or outage that strikes without warning, organizations can still react quickly within minutes. Lacking the infrastructure dependencies that prevent easy movement, critical applications can securely live and move between multiple on-premises and cloud environments.
Each time a data centre or IT disaster takes over headlines, CIOs and IT professionals everywhere wince. The IT industry cannot continue with manual systems and legacy backup approaches. Hoping for the best is not a strategy. The key to ensuring uninterrupted operations is improving flexibility and accessibility to the data and applications that run the entire industry. Putting more focus on business continuity and disaster recovery capabilities that use and rigorously test cloud-based infrastructures can make the industry safe, profitable and reliable.
Gil Levonai is CMO at Zerto.