Blog

How to Start Building Your Disaster Recovery Template

Posted by Dave Kravitt, CPIM

disaster recovery guideDisaster Recovery (DR) is no longer about what will it cost you to recover and rebuild your IT systems. With growing business immediacy and complex interdependencies of IT systems, the question has rightly shifted to what will it cost you if you cannot recover your systems. There’s considerable interest in drawing up a Disaster Recovery template for a business, which is effectively a plan to get the business back on its feet in the face of disaster.

But before you can get to a full-fledged DR template, you have to do some business soul searching and answer a few critical questions. Every business is different, even within the same industry – so it’s important to get into the details of each business process you have in place. That means the first key step to drawing up your DR plan is to fully understand each of your business processes and how their downtime impacts your business.

Evaluating Your Business Processes – Where To Begin?

First, know that this task is not something for the IT team to do; it should be led by business and supported by the IT team. A business team has to have discussions with every department and their processes to uncover any factors that might stand in the way of getting the business back up and running.

To that end, it’s critical to assign priorities for each process. Our Ultimate Guide To Disaster Recovery will walk you through additional details on how to do that.

In summary, Tier 1 processes are customer focused, and really cannot afford to be down for any length of time. Tier 2 processes may not need immediate recovery. Tier 3 processes could be important for your employees, but not critical to the business. And so on.

See a sample DR matrix showing tiered processes.

Keep in mind, you need to consider which business processes are customer-facing and directly affect sales and revenue. Let’s say your customers need to speak to your sales representatives before placing an order, and that’s an ongoing process that draws in daily sales. This is a process that simply cannot stop. That constitutes Tier 1– so the applications that enable this process cannot afford downtime.

But do not stop at that.

guide to disaster recovery

Explore all related parameters; including interlinked processes and applications, including the following additional considerations:

Map the process to other business processes: Some processes at the outset may look like a Tier 2 or 3. But if they don’t work, the Tier 1process can only do so much. For example, your customer-facing, order-taking applications might be up and running. But what if the product is not available at the moment and needs to be sourced from a vendor listed in your procurement system, which happens to be down? The procurement application is not customer-facing, and maybe a couple of levels down from Tier 1, but now has become key to order fulfillment. And this failure speaks directly to your order delivery guarantees and customer satisfaction KPIs.

Next, ask the following questions:

Which are the applications involved? You have to work out each business process, its interlinkages with other processes, and the applications involved. And then look at the downtime for each application.

Does the business process have a peak period? Is there a time of the day, week, month, year that there are peaks to a business process? For example, in the Tier 1 process above, do more people buy between 4 p.m. and 7 p.m. in a day? On Fridays? And a whole lot more during holidays?

When you have answers to these questions, you are clear about the specific times when the systems simply cannot go down. And you could identify durations, where downtime might even be acceptable. So you might not even need a 99.99% availability round the clock, but in spurts during specific periods. This could help you design a more cost-efficient Disaster Recovery/High Availability set-up as well.

How long can the business tolerate the downtime? This is defined by the Recovery Time Objective (RTO), which is the desired duration of time within which a business process must be restored so unacceptable business consequences can be avoided. Go through every business process and see what that time duration is and most of all, beware of keeping RTO estimates that are too high – read more in RTO Pitfalls.

What is the operational and financial impact? Now you can get to the numbers. You are aware of the value chain affected when a process cannot be operationalized because of downtime. What does that mean for the business?

  • Does the sales or income get affected? For example, if the order taking application went down at the peak time identified, how much sales is lost in the time the systems were unavailable?
  • Does it affect the cash flow? By how much?
  • Will there be increased overheads because you have to deploy additional resources to get things done? How much will that work out to for every hour of downtime?
  • Are there any compliance or regulation related challenges?
  • Is the downtime causing a ratings downgrade in a customer system, leading to loss of future revenue? What is your current rating, and at what point will it get affected? And when the rating is downgraded, how much revenue loss does that translate into?
  • What is the impact on customer satisfaction and retention?

How will the process get done in the event of a disaster? What is your backup plan, if a business application is down? Perhaps you have a process that can happen offline. Often confusion reigns for a while before things start to streamline, and this can cause delays in the business process. If there is an offline process, do the people needed know about it? How well are they trained to execute it? How much time will the process take to complete in this offline manner? What impact does all this have on the customer? How will it be communicated to the customer?

Are there parts of the process that simply cannot be done offline?

Is there a way to record the transactions happening during downtime? How will they be recorded into the systems once recovery has happened? What if this doesn’t happen?

These questions will help you answer whether you have a robust manual process so work continues as usual, and there is no missing data. They help you identify the elements which are fully dependent on an online system. You will also be able to know what needs to be done to bolster your manual process, so it is robust and seamless when disaster hits. You might need additional resources, certifications and experts, which you will have to factor in as your costs to overcome downtime.

What if your partner’s system is not available? At this point, you have covered a lot of ground on what’s happening within your company. But your partner, suppliers or vendors may face downtimes as well. How is your business geared up for that? Will you have manual processes to take over? Do you have a rating system that ensures they keep their systems highly available? How much revenue do you lose if they don’t come back online quickly? What’s the business’ tolerance for that?

While the partner system assessment will not form an immediate part of your DR plan, looking at processes that have dependencies on external applications will help you design a plan that mitigates the risks associated with their downtime.

Once you have viewed your business processes, not just from a bird’s eye view, but also from a worm’s eye view, you can detect the areas that leave your business vulnerable to risks associated with downtime. And have a good handle on the costs attached to downtime, and the tolerance level of the business to these costs. Once you know what is acceptable, and what is not, your Disaster Recovery plan becomes easier to build and test.

guide to disaster recovery

Tags: Disaster Recovery