3 Major Pitfalls to Avoid With Your IBM i Disaster Recovery Plan

Guest blog by John Hamel, Principal, TurningPoint Systems

ibmi disaster recovery.jpg

Prolonged downtime is every manufacturing firm’s nightmare. Even companies running the highly-reliable IBM i OS recognize they need some sort of disaster recovery plan in place to protect their business from major disruptions.

But equally as dangerous as downtime is a recovery plan that isn’t sufficient. Despite the growing focus on disaster recovery and high availability in today’s highly connected supply chain environments, there are still some common pitfalls that create a false sense of security for business owners. In this blog, we will highlight three major misconceptions and discuss what you can do to avoid them.

Pitfall 1: You have an unrealistic estimate in mind of your recovery time objective.

How long can your Power i system be down before your business is seriously impacted? To answer this, business owners will estimate their recovery time objective (RTO) based on a few different factors to come up with what they deem an acceptable level of downtime. I’ve heard estimates that range from a couple of hours to a couple of days.

In the event of unexpected downtime, however, most businesses learn too late that their RTO is grossly underestimated. Why? In today’s increasingly interconnected manufacturing IT environment, our systems are far more integrated than we perceive them to be. These interdependencies make it difficult to truly predict all the ways that one system will affect another – in turn leading to estimates that are dangerously unrealistic. Want to get a better handle on how to assess RTO and RPO? Check out The Ultimate Guide to Disaster Recovery for a helpful chart of questions to ask.

Pitfall 2: You’re approaching RTO from an internal perspective.

Which factors are you considering when you estimate your recovery time objective? Many businesses establish their tolerance for downtime in terms of their internal operations: they look to their systems, critical applications and processes to establish their RTO.

But here’s what’s missing from that approach: your customers. How will your customers be affected when your system is unexpectedly unavailable? What is the external impact of orders that are lost? What happens when those orders cannot ship or be received? Customer impact and customer attrition is a critical part of this equation that’s often overlooked. Be sure to approach RTO from the perspective of your customers, and you’re likely to realize your tolerance for downtime is far less than what you originally estimated.

Pitfall 3: You’re relying on tape back-ups.

What may have been a viable option years ago is no longer sufficient by today’s standards. Of all the approaches to disaster recovery, tape back-ups are the most common and perhaps the most concerning. Let’s assume your business needs to restore its Power i machine because it was lost to a physical disaster (i.e., fire or water damage). In those critical moments, you’re likely to discover that tape back-ups offer even a dismal “best-case” scenario: to obtain a new Power i machine, reconfigure it, then restore it through a tape back-up will take, at best, 3½ to 4 days. Even more astonishing is the fact that over 30% of all recovered tapes are found not to work.

Tape back-ups can also result in enormous data loss, whereby transactions are not recorded from when the last backup was performed to the point in time when the failure occurred. (More here on Recovery Point Objective.) When you start to think about the number of transactions that occur even within one hour, let alone half a day, the magnitude of this problem becomes overwhelming. What’s more, companies often find they don’t have a viable way to go back and re-enter into their system all those “unaccounted for” transactions.

A Comprehensive Plan: High Availability and Disaster Recovery as a Service (HA/DRaaS)

For manufacturing firms who are worried they are a victim to these common pitfalls, there’s good news. Small and mid-size firms are following the same strategy that large global firms have already adopted: a disaster recovery plan that focuses not just on disaster response, but high availability. In fact, HA/DRaaS, which uses real-time software replication to a cloud server, is the fastest growing cloud-based service in the market.

With a HA/DRaaS plan, as changes are made on a production system, they are replicated on a back-up system (ideally in a separate location). Then, in the event of a hardware failure or natural disaster, a “roll swap” is initiated to ensure minimal impact to business. To learn more, read: Should High Availability Be Part of your IBM i Disaster Recovery Plan?

The fast-growing HA/DRaaS market also reflects a more strategic mindset for manufacturers as they think about protecting themselves. What started as the question of “What will it cost me to recover and rebuild?” has evolved to “What’s it going to cost me if I CAN’T recover and rebuild?” -- and furthermore, “What’s the true cost of that disruption to my customers and my business?” And once we focus on these questions, the case for a HA/DRaaS plan becomes clear.

Want to learn more about implementing a HA/DRaaS plan? Get started with helpful tips in our popular blog, Key Considerations for Creating the Right Backup Plan for Your Enterprise.

About TurningPoint Systems and Precision Solutions Group, Inc. (PSGi)

Along with partner TurningPoint Systems, PSGi provides high availability DRaaS (disaster recovery as a solution) systems. PSGi offers IBM i Managed Services that ensure your systems deliver a secure, reliable environment; while TurningPoint provides the Disaster Recovery platform that minimizes the financial impact of downtime and data loss.

Blog