Downtime can impact businesses in various ways, from losses in data and productivity to missed opportunities and hits to customer experience. But the ultimate impact is cost. Downtime is said to cost businesses just under £4,300 an hour ($5,600), according to Gartner.
I unpack this topic in more depth with my colleague Steve Woodhams, Cloud Solution Architect, in season two, episode 10 of the ColumbusCast Podcast.
What are the top consequences of downtime?
Downtime has gone from something that we have to factor in at some point (in other words, planned downtime, scheduled maintenance etc) to people not wanting any downtime. Why? Because downtime is something nobody can really afford in the modern world.
So, cost is the top consequence of downtime as I mentioned earlier. Just think about the immediate impact - when your systems are down, people can’t do their job. And then you look wider. Most of the digital systems that we implement at Columbus are delivering digital services to our customers’ customers.
In many cases, downtime can affect the relationship with those customers. For example, if a website isn’t working or customers can’t get through to support, this doesn’t reflect good customer service or experience. This can cost you the customer at the end of the day.
How can we minimise the impact of downtime?
You must understand uptime - AKA the opposite of downtime.
One of the things we spend time with customers doing is discussing what is realistic in terms of uptime and how that is communicated to customers and employees. In most cases, it’s fairly unrealistic for a system to be available all the time because we need to do various levels of patching, updates and so on. These planned updates need to happen so businesses should communicate that to their stakeholders clearly.
Status pages can be an ideal approach to take. Most global cloud-based services have a status page so you can see the service status of all of those services, whether it’s planned or unplanned. This isn’t a bad model to adopt, even for smaller deployments. That level of communication helps with relationships, whether it’s internal or customer-facing.
People want to know where they are. They want to be aware of the situation, how it might impact them and when it will be resolved. The more information you can provide, the happier customers and employees will be.
What are some common causes of downtime and how can you reduce their impact?
There are two buckets these causes could fall into:
Scheduled maintenance is a great example of planned downtime. But, there are things you can do to minimise its impact which is something Steve points out in the episode. He mentions that if a system contains many servers, you can build redundancy into that system to work around that.
Using Dynamics ERP systems as an example, you might look in the backend and there are two SQL servers in an active/passive configuration. Updates happen on one system and the system uses the backup SQL. Then switch over and reverse so the system’s always up. It might not be working optimally, but we can keep it up temporarily while the maintenance is happening.
Ultimately, this comes down to design and architecture of the system. When you deploy a system, you can choose how resistant to downtime it’s going to be.
As for unplanned downtime examples, two spring to mind for me:
- Unexpected - as in people doing something unexpected. However, we can design systems in a way that minimises this e.g. people only having the permissions they need
- Cyberattacks - reduce your exposure to cyberattacks and data breaches by keeping your systems up-to-date, training your staff and employing round-the-clock monitoring, for starters
Bonus question: Cloud vs on-premise - is one more susceptible to downtime?
This depends on how the system is set up, not where it is. However, public cloud is architected so you don’t have to think about the fundamentals. For example, no-one will go and unplug your server. Microsoft won’t let that happen.
On the flip side, when it comes to cyberattacks, on-prem systems aren’t connected to the internet. So the chances of people maliciously attacking that system are next to nothing - they’d have to physically be on site to do it. In comparison, the public cloud and its end points are - you guessed it - public. So, there’s a greater attack surface for hackers to get in.
Microsoft and other cloud providers wrap a lot of security by default but that’s not to say that if you design something badly, you won’t open a hole somewhere. As a result, it’s all about good architecture and having a level of awareness of how much downtime you can tolerate.
Be conscious and deliberate about where you put your efforts to minimise downtime. Know which systems are critical enough that you need them running 24/7.
Steve and I discuss all of the above in far more depth in our episode. So, scroll up to the top of the page or search for ‘ColumbusCast’ in your phone’s podcast app to hear more.
Ensuring your IT systems are evergreen can reduce the impact of downtime
What is evergreen and how can it help businesses stay competitive and compliant (which can reduce the impact of downtime)? Click the button below to tune in to our discussion on the importance of evergreen and continue your journey.