By Akash Lal, Senior Partner, and Mrityunjay Ravi Iyer, Associate Partner, McKinsey & Company
In today’s digital world, businesses rely on technology to deliver ‘all day, every day’ seamless services and maintain customer relationships. Hence, keeping the IT systems ‘up’ has been an imperative in an increasingly sophisticated technology environments, need for software upgrades, multiplicity in vendors and cyber threats.
In the previous year, there was a 75% increase in intrusions into cloud-hosted environments, and that number is increasing as companies move to the cloud more.
The financial and operational impacts of technology outages can be severe: a recent Gartner study found that the average cost of IT downtime is $5,600 per minute, with some industries experiencing even higher costs. These financial implications can result from lost productivity, customer dissatisfaction, and potential legal liabilities.
A proactive approach from a management perspective to technology resiliency could involve six initiatives: constant assessment and prioritization of risk areas, investment rebalancing in technology architecture, setting-up of ‘phased change processes’, proactively managing vendor risk, deeper disaster recovery and business continuity testing, and timely communication during outages.
Constant risk assessment and prioritization: Often, companies do not assess the economic impact of a few days’ outage at their technology vendor, at a factory or in a process. Conducting a periodic assessment of the technology infrastructure and pinpointing critical dependencies external and internal both – such as at ‘end points’ like servers/devices – can help to prioritize investments to shore up the most vulnerable areas.
Resilient technology architecture: Investing in modern, highly available systems that prioritize resilience and rapid recovery is essential – in addition to investing in new features. Cloud technologies are inherently scalable, flexible and ‘re-pave able’ – can be reinitiated with one touch, accelerating recovery. Additionally, ‘geographically resilient’ application architectures – those that distribute data and workloads across multiple geographic locations – further enhance availability during disruptions.
Phased change processes: Resiliency problems often stem from a change – a configuration change or software update can disrupt an entire system. Deploying a new update to 1% or 5% of nodes can significantly reduce disruption in the event of a flawed release. This approach would necessitate more resources but may be worth the investment, given the reduction in risk of disruption.
Proactively managing vendor risk: While dependence on external vendors and original equipment manufacturers is inevitable, treating any updates and patches with the same level of scrutiny as internal changes is critical. This would include harmonizing safe software deployment practices with vendors, across diverse configurations, and tighter coordination and recovery procedures to increase incident response effectiveness.
Sufficiently extensive business continuity planning and testing, beyond ‘on paper’: Regular live tests and crisis simulations for a variety of scenarios ensure that response protocols are well-coordinated and check that decision-making processes will be efficient during outages, rather than conducting a paper-based exercise. This approach would make more in-depth every company’s plan on disaster recovery and business continuity.
Communication and transparency: It is important for businesses to quickly acknowledge the impact and provide regular updates on the progress of remediation. This will help to build trust with employees and customers, and to mitigate the negative effects of disruptions. Some years ago, after a large ransomware attack, the CEO was quick to call customers to explain the incident and the remedial measures being taken. Even years later, customers remembered and appreciate this.
The growing cyber threats – and the CrowdStrike incident this year – is a call for organizations to reevaluate technology resiliency. Taking a proactive stance, rebalancing investments to keep systems and infrastructure robust, and prioritising enhancement in security can enable businesses to contain the impact of, and pre-empt, technology disruptions. This would safeguard their operations, as also their reputation and customer relationships.