Gaining an understanding of the kind of impact an unplanned outage or disaster can have on an IT environment is crucial to selecting which technology must be deployed to reduce or mitigate that impact. This is usually done through a business impact analysis (BIA).
The kind of language experienced consultants typically use in a BIA report is always conditional. Wording such as "could potentially generate losses of up to..." or "would not meet the current regulatory requirements…" is often used to describe the impact. The use of conditional tense is common practice because a specific condition must exist in order to give meaning to the impact analysis; that is, a disaster must actually occur. Without the prospect of a disaster, the entire BIA has little meaning.
However, establishing the probability of the occurrence of a disaster is often viewed as a daunting task that requires a lot of research, time and budget that SMBs usually don't have. That being said, SMBs cannot afford to ignore risk or conversely, protect themselves from all risks known to mankind. This tip offers some guidance to conduct a risk assessment without spending as much as a disaster might cost.
What is a risk?
The business continuity industry defines the term risk as, "the weighing of an identified threat against the probability of it occurring." At a high level, we must identify existing threats, our possible exposures and, to some degree, the likelihood of the threat materializing.
Understanding the IT environment
The purpose is to understand what the IT infrastructure looks like. This includes systems, applications, network, facilities, etc. Without a good understanding of the IT environment, it is not possible to identify specific strengths (controls) and weaknesses (vulnerabilities), rendering the risk assessment process somewhat speculative.
Both internal and external threats must be identified. For example, external threats may include earthquakes, hurricanes or tornadoes, fire and flooding. Power outages and terrorism can also be external threats. Internal threats may include hardware failure, sabotage, hacking or accidental data deletion or corruption. Threats will then be measured against vulnerabilities.
Identify vulnerabilities or exposures
These are the weaknesses in the environment that create an exposure to the threats previously identified. This is where a solid knowledge of the IT infrastructure comes into play. The source of each vulnerability should be documented so it can be addressed.
Review controls in place
The controls are the measures already in place that could reduce or mitigate the impact. For example, accidental data deletion can constitute a threat and having the data stored on a RAID 0 disk array could be the vulnerability. But having nightly backups in place would be the control. At this stage, it is important not to fall for the "we could do this or that" workaround syndrome. Only documented, tested or planned controls should be included.
Determine likelihood or probability
This is where many organizations tend to lose control of the process as it can start looking like an insurance underwriter's exercise. The idea is not to try to calculate probability percentages based on statistics that would look like this: "There exists a 42% probability a major earthquake will hit Southern California within the next 10 years."
In order to determine the likelihood, we must look at the output from the previous steps, including:
- Threat identified
- Nature of the vulnerability
- Existence and effectiveness of the control(s) in place
The output of the likelihood determination can simply be rated as high, medium or low. This output will then be used for the risk determination.
The output from the BIA is leveraged to gain an understanding of the financial impact of a threat exercising a weakness or vulnerability. We must keep in mind that the impact cannot always be measured in terms of immediate financial losses and can be less tangible, such as a tarnished reputation. In such instances, the impact can also be rated as high, medium or low as in the previous step.
Using the output from the likelihood assessment combined with the impact rating, we can assign a ranking to the risk. The diagram below shows how the combination of probability and impact is used in determining how risk ranks in order of priority and the actions needing to be taken. For example, a risk with low impact and low probability of occurrence may result in no specific action being taken. Conversely, a highly probable and high impact risk will command action.
Plan and act
This is the final step of the process. Guided by the knowledge of threats and vulnerabilities identified, the resulting risk, the understanding of the IT environment and the current controls in place, a plan can be devised. At a high-level, this plan should at least include:
- Actions that will be taken regarding each risk identified, which can vary from acceptance to elimination
- Proposed controls to mitigate the risk (e.g., high-availability, DR strategy, security, etc.)
- A cost/benefit analysis of the proposed controls versus potential losses
- Design of the strategy
- Implementation plan for the strategy
Unlike many other IT decisions, risk management and disaster recovery planning in general are not expenditures that can be supported by a ROI calculation. You do not invest in risk management; you use it to protect your investment. That said, we must always remember that the cost of a recovery strategy should never exceed the losses it is designed to prevent.
Pierre Dorion is the Data Center Practice Director and a Senior Consultant with Long View Systems Inc. in Phoenix, AZ, specializing in the areas of business continuity and disaster recovery planning services, and corporate data protection.
Do you have comments on this tip? Let us know. Please let others know how useful this tip was via the rating scale below.
Do you know a helpful storage tip, timesaver or workaround? Email the editors to talk about
writing for SearchSMBStorage.com.
This was first published in December 2008