The RTO is the specific amount of time before your business is negatively impacted by the interruption of a system or application supporting one or many critical business processes. It is generally accepted that an outage exceeding the RTO set by a company can be considered a disaster (see the DRJ Glossary).
It is important to note that an RTO is not the same as a service-level agreement (SLA). SLAs are usually defined for a single application or service and can be much more aggressive than an RTO, as they typically include planned outages and often assume the availability of other IT infrastructure components. Additionally, the consequences of a missed SLA are often not as severe as a missed RTO.
Where to start
The financial impact of a business interruption is probably the most important consideration when attempting to establish the RTO. The potential financial losses due to an interruption not only dictate how quickly a business needs a function or service to resume, it also dictates how much money will be invested on technology that will help prevent an outage from exceeding that time frame.
An internal review of the revenue-generating business activities and supporting IT infrastructure will help to establish the criticality of the systems and therefore, their respective recovery requirements.
Critical business processes must first be identified
These processes are often referred to as "core" functions and are usually directly tied to the revenue stream. The interruption of these processes usually means that some or all key services are not available to customers or end users. Criticality is most often directly tied to revenue and possible financial losses.
Once the criticality of business processes has been clearly established, their respective supporting IT infrastructure must also be identified, including servers, storage, network, etc. Ultimately, these are some of the elements you will need to recover in order to resume a given business process.
Probably the most difficult, but also most important part of defining the RTO is to calculate the losses that could result from an unplanned interruption. Losses are difficult to calculate with exactitude because they are not always a simple matter of adding up lost sales because the network or a server was down.
Losses can depend on calendar cycles; losses can also be less immediate and tangible than missed sales or opportunities and can affect the company's reputation. To further complicate matters, the probability of an unplanned interruption must also be taken into consideration.
Nonetheless, estimating potential losses can't be overlooked, because the sum of those losses will impose a cap on how much you can spend on preventing them. In other words, the cost of a recovery strategy should never exceed the losses it is designed to prevent.
A function of establishing the RTO for a specific application includes identifying dependencies that may influence or impact recovery. For example, if an application must be recovered within 12 hours to allow end-user access, the network components, the security and authentication mechanisms must also be in place or recovered within the same time frame.
Create a documented manual process or contingency plan that could temporarily allow a function to be performed to buy some time, thus potentially allowing a less-aggressive RTO.
Recovery must also always consider time required for notification, response, situational assessment, procurement and rebuild in the event of a disaster. Assuming that an RTO will be met simply because there is not a lot of data to restore is a false sense of security. The RTO must be achievable in a worst case scenario.
Once you understand what losses could be incurred should a critical business process be interrupted and you have identified the IT infrastructure supporting that process, you will have an idea of the recovery requirements. A compromise between the cost of the recovery solution and the potential losses will dictate the actual RTO.
About the author: Pierre Dorion is the Data Center Practice Director and a Senior Consultant with Long View Systems Inc. in Phoenix, AZ, specializing in the areas of Business Continuity and Disaster Recovery Planning Services and Corporate Data Protection. Over the past 10 years, he has focused primarily on the development of Recovery Strategies, IT Resilience and Recoverability as well as data protection and availability engagements at the Data Center level.
Do you have comments on this tip? Let us know. Please let others know how useful this tip was via the rating scale below.
Do you know a helpful storage tip, timesaver or workaround? Email the editors to talk about writing for SearchSMBStorage.com.
This was first published in August 2008