By Tony Anscombe, Chief Security Evangelist, ESET
If a software update process fails, it can lead to catastrophic consequences, as seen today with widespread blue screens of death blamed on a bad update by CrowdStrike. Cybersecurity is often about speed; a threat actor creates a malicious attack technique or code,
cybersecurity companies react to the new threat and if necessary, adjust and adopt methods to detect the threat. That adoption may require updating cloud detection systems and/or updating endpoint devices to provide the protection needed against the threat. And speed is of the essence as the cybersecurity industry is there to protect, detect and respond to threats as they happen.
The processes cybersecurity companies put in place to avoid conflict between an update and the operating system or other products are typically significant, with automated test environments simulating real-world scenarios of different operating systems, different variants of system drivers and such. This, in some instances, may be overseen by humans, a final sign off that all processes and procedures have been followed and there are no conflicts. There may also be third parties, such as an operating system vendor, in this mix that test independently of the cybersecurity vendor, attempting to avert any major outage, as we are seeing today.
In a perfect world, a cybersecurity team would take the update and test it in their own environment, ensuring no incompatibility. Once certain the update causes no issue a scheduled rollout of the update would begin, maybe one department at a time. Thus, reducing the risk of any significant issue being caused to business operations.
This is not and cannot be the process for cybersecurity product updates, they need to deploy at the same speed that a threat is distributed, typically near instantly. If the update process fails it can be catastrophic, as is being played out today with a software update from CrowdStrike, with blue screens of death and entire infrastructures down. This does not signify incompetence of the vendor, it’s likely to be a scenario of bad luck, a perfect storm of updates or configurations that create the incident. That is of course unless the update has been manipulated by a bad actor, which appears not to be the case in this instance.
What should we take away from this incident?
Firstly, all cybersecurity vendors are likely to be reviewing their update processes to ensure there are no gaps and to see how they can strengthen them. For me the real learning comes that when a company reaches a significant market position their dominance can cause a semi-monoculture event, one issue will then affect many.
Any cybersecurity professional will use terms like – ‘defense in depth’ or ‘layers of defense’ –this refers to the use of multiple technologies and in most cases multiple vendors to thwart potential attack, it’s also about resilience in the architecture and not relying on a single vendor. We should not lose sight of who is to blame when an incident such as this happens, if cybercriminals and nation state attackers did not create cyberthreats then we would not need protection in real-time.