Virtually eliminating downtime

Business continuity and disaster recovery are two key priorities for IT managers. With an increasing amount of public services reliant on IT systems, minimising planned and unplanned downtime has become critical to the level of service delivered by public sector organisations. Increasingly, IT departments are turning to virtualisation to deliver a more cost-effective, straightforward and reliable platform for keeping systems up and running.
    
While often confused, Business Continuity (BC) and Disaster Recovery (DR) are nonetheless closely related. While BC focuses on keeping systems up and running and ensuring the IT infrastructure is resilient to failure, DR is more focused on an organisations ability to react when failure or disaster occurs.

Responding to disaster

The term ‘disaster’ can summon up images of floods, fires and earthquakes but for most companies a ‘disaster’ can be caused by something as mundane as hardware failure or software corruption. For occurrences as commonplace as this a company must have a response in place.
    
A proper DR strategy is a must for almost any organisation’s IT systems, but previously many have found the cost of purchasing redundant systems and high availability software to be prohibitive. In fact, the typical company spends upwards of 70-80 per cent of its IT budget in simply ensuring ‘the lights are on’, leaving the business unable to support new strategic initiatives.  
    
Justifying expenditure on DR should, in theory, be relatively simple – the cost of even a few hours downtime for many companies can prove more expensive than the costs of technology that might have averted that downtime. However, the very nature of disaster recovery systems is that a company hopes it will never have to use them, so it can be difficult to justify outlay on technology which can sit idle when the business may have other more pressing requirements.

Making do
Many organisations, particularly smaller ones without significant IT budgets, have had to survive with ‘make-do’ approaches to DR. They might have invested in tape backups to ensure data could be recovered. However, these organisations would often be unable to thoroughly test recovery procedures and would have been reliant on support contracts from hardware partners to help them through any kind of significant incident.
    
Part of the reason why disaster recovery has previously been so tricky and expensive to implement is because of the lack of flexibility inherent in physical approaches to computing. Applications have traditionally been tied to individual pieces of hardware in the data centre, so recovering that application in a DR scenario required an exact replica of the hardware, application and operating system. Not only does this require a sizeable financial investment, but the time spent on keeping software patches and configurations identical on the source and DR target machine is also significant.
    
Complex tools and processes such as system imaging tools, tape backups and restore processes for systems and applications have typically required specialised skills and resources. Furthermore, these traditional methods of recovery such as tape or system image recovery have a high rate of failure, compromising an organisation’s ability to recover.
    
This complexity and cost is why more  organisations are taking a virtualised approach to DR. Virtualisation has for some time been associated with server consolidation and reducing the IT Capital Expenditure (CapEx), but the benefits extend far beyond the cost savings end-users can achieve by reducing their hardware footprints. The natural next step in virtualisation is reducing the day-to-day running costs of the IT, namely the Operational Expenditure (OpEx), specifically around initiatives such as DR.

The virtual environment
To understand why DR is so much easier in a virtual environment it is necessary to consider the specific attributes of virtual machines and how they differ from physical ones. Virtual machines are essentially files that encapsulate an application and operating system stack. Like any other file – a word processing document or MP3 – virtual machines can be copied and are completely portable. Furthermore – because they run on top of the hypervisor – they are hardware agnostic, meaning they can be deployed on almost any x86 server provided it is virtualised.
    
From a DR standpoint, the characteristics of virtual machines – encapsulation and portability – make them incredibly simple to backup, replicate and restore. Protecting a complete system is just a matter of protecting a few files. The files that make up a virtual machine can be recovered to any hardware without requiring any changes because virtual machines are hardware-independent. These characteristics mean that tasks such as server migration, backup and recovery and replication can be treated as simple data migrations or file copies.
    
Because these properties are inherent in virtual machines, any organisation – regardless of size – can benefit from this increased flexibility. A small customer with only three or four host machines suddenly has a very simple way of dealing with hardware failure; if a host machine goes down then the virtual machines it was supporting can simply be brought up on another machine. The more advanced virtualisation platforms can automate this process – so virtual machines will automatically be moved away from failing hardware and restarted without user intervention.
    
Previously, hardware failure could have bought down a system for hours or days while replacement hardware was sourced and installed. In a virtual environment this downtime can be limited to minutes or even be completely avoided in many scenarios by using virtualisation technologies that go even further and enable fault tolerance.

Taking away the headache
Testing disaster recovery strategies has always been a significant headache. Ideally every IT department would like to be able to test its DR and failover capabilities in full on a regular basis, but the potential disruption and risk rules such an approach out. Virtualisation makes disaster recovery testing very simple and because of the isolation between virtual machines, it is also non-disruptive. DR servers or complete virtualised DR sites can easily be tested without having to bring production servers offline – thereby negating any risk that the test could disrupt the wider business.
    
From a cost perspective, virtualisation delivers significant benefits over physical approaches to DR. One strategy, which many organisations use, is to take older hardware that has been decommissioned after a server consolidation initiative and repurpose this for DR purposes. Instead of having to justify expenditure on DR hardware, the IT department can effectively source this hardware from within their own data centre.
    
This DR hardware can also be made to work far more efficiently by virtualising it. Virtualised DR hardware can run multiple workloads, unlike with physical environments where there is a requirement for 1:1 mapping of resources. Effectively, one server can quite easily act as a disaster recovery platform for as many as 15 or 20 applications.
    
Virtualisation is also at the forefront of the Cloud Computing movement, its ability to provide a uniform IT platform appealing to customers and cloud-providers who don’t want to re-architect applications to work on proprietary cloud infrastructures. Cloud computing can be used to offer on on-demand DR platforms; the moment an outage occurs service can failover to virtual infrastructure being provided by a cloud provider. This approach has proven popular with smaller public sector organisations without a secondary site available for DR.
    
A similar approach can be adopted by public sector organisations moving to a shared-services model. In the same way that a cloud provider can offer their virtual infrastructure as a DR location, two or more public sector organisations can collaborate to offer each other IT capacity in the event of an outage. Again, the fact that both organisations will be sharing a common virtualisation platform means that compatibility issues will not compromise this approach, while costs can be shared between partners.
    
Virtualisation vendors and those companies that specialise in DR software and storage solutions are now looking at ways of extending the capabilities of virtual infrastructures to better support customers’ DR initiatives. With so much of the complexity taken out of the recovery process, the future for virtualised DR is focused on bringing even more automation to the various tasks IT departments perform to protect critical applications. More advanced technologies now allow users to build complete DR workflows for production environments and automate the failover process between primary and secondary sites.

Business continuity
In terms of business continuity, virtualisation also has a significant role to play, although many of the traditional concerns here remain. These include things like UPS and network redundancy which all help to provide a more resilient infrastructure. However, one area where virtualisation can significantly improve an organisation’s BC capabilities is around planned downtime.
    
Planned downtime typically accounts for over 80 per cent of datacenter downtime. Hardware maintenance, server migration, and firmware updates all require downtime for physical servers. To minimise the impact of this downtime, organisations are forced to delay maintenance until inconvenient and difficult-to-schedule downtime windows. In today’s 24/7 workplace this is sometimes no longer feasible.
    
One of the most popular technologies the advanced virtualisation platforms deliver allows live-running virtual machines to be moved between hosts without downtime. This means that when maintenance is required on a piece of hardware, workloads can be moved off that host machine non-disruptively, allowing the work to take place. Not only does this eliminate downtime for common maintenance operations but it also means that maintenance work can be performed at any time without disrupting users and services.
    
In today’s environment, having robust IT infrastructure is no longer a luxury, it is a necessity. Until recently, making this a reality for most IT environments was prohibitively expensive and technically very challenging. As an increasing amount of organisations begin to standardise on virtual infrastructure, DR and BC have become simplified and more cost-effective, giving public sector IT departments the ability to build strategies and workflows which allow them to protect systems from the multitude of threats they face.Case study: the pensions regulator
The Pensions Regulator is the UK regulator of work-based pension schemes, and came into force in April 2005, replacing the Occupational Pensions Regulatory Authority (OPRA). It has 300 employees primarily working out of its headquarters in Brighton and also works closely with a range of stakeholders including the Pensions Protection Fund (PPF).
   
As a non-department government body, The Pensions Regulator is required to operate to strict budgets across the organisation. In line with this, in 2007 the IT department initiated a review of its IT infrastructure which was becoming expensive and resource intensive to maintain. The Pensions Regulator began to investigate how virtualisation technology could help solve the cost, capacity and availability issues being experienced by the organisation.
   
Since installing VMware Infrastructure, The Pensions Regulator has benefited from a significantly higher level of availability across its infrastructure. VMware DRS, VMotion and HA are used extensively to provide business continuity and disaster recovery as standard to every virtual machine in the infrastructure. VMware HA is used to restart virtual machines automatically on a different host machine should any hardware fail, while VMotion has helped the organisation eradicate planned downtime by moving workloads away from physical machines requiring maintenance.  VMware has allowed The Pensions Regulator to reduce its disaster recovery times for critical applications from around a day to as little as one hour. 
   
“Perhaps the most critical service we support is a Pensions Web Portal ’Exchange’ for pensions scheme administrators across the country, and we have already taken steps towards a cloud approach to support this using VMware.“ said Ray Heffer, technical infrastructure manager at The Pensions Regulator. “Our hosting partner provides the physical infrastructure along with VMware Infrastructure that is required to support a 24x7 website, such as UPS, generator backup and resilient diverse fibre and microwave high-performance internet connectivity, but crucially we can monitor and maintain the virtual machines running on that infrastructure centrally, as if they are running within our own data center. This has been such a success that we are looking at using this hosting facility for offsite disaster recovery purposes in the future.”

Please register to comment on this article