On July 19, approximately 8.5 million Windows machines were frozen, causing flight cancellations, banking disruptions and media outages around the world. Major US airlines, including American Airlines, United Airlines and Delta, had to cancel flights due to communication problems. Banks and stock exchanges, including the London Stock Exchange, Lloyds Bank and South Africa’s Capitec, faced similar problems. According to data from DownDetector, the outage also affected Visa and Mastercard payment gateways.
The outage led to serious financial problems. For example, Delta’s cancellation of nearly 7,000 flights could cost the company $350 million to $500 million. By some estimates, the total direct loss incurred by U.S. Fortune 500 companies, excluding Microsoft, was $5.4 billion. The healthcare sector is the hardest hit, with expected losses of $1.94 billion, followed by the banking sector with estimated damage of $1.15 billion. The airline industry also experienced significant disruptions, leading to an estimated $860 million in losses. Fortune 500 companies alone could suffer direct losses of $5.4 billion.
What went wrong?
The outage was caused by flaws in an update to the Falcon security platform by information security solutions provider CrowdStrike, as the company later explained. Interestingly, the update was successfully tested on March 5, but the error could not be detected due to a bug in the diagnostic software.
CrowdStrike also noted that it typically provides security content configuration updates in two ways: one through Sensor Content, which comes with the Falcon Sensor component, and the other through Rapid Response Content, which flags new threats using various methods for matching behavioral patterns. The last was the one that contained the previously undiscovered bug.
Why did this error lead to blue screens around the world? The reason lies in the relationship between these types of endpoint protection software like Falcon and operating systems: there is no way to prevent such software from monitoring the operating system, as this would open up the possibility for a virus to take over. This scenario would defeat the very purpose of having a security application as it would allow malicious entities to bypass the protection measures completely.
Gradual upgrades and regular backups
Despite the significant impact of the recent incident on businesses and organizations, it is unlikely that CrowdStrike products will be widely abandoned. Solutions like Falcon are deeply embedded in IT infrastructures and have been developed and refined over decades. Replacing it is time-consuming and expensive. Furthermore, there is no guarantee that alternatives would not lead to the same problems.
However, this incident sheds light on some burning issues in the technology industry. One of them is the lack of diversity. Today the market is dominated by just a few major suppliers, and this concentration of control is precisely why the impact of the incident was so widespread. To mitigate such risks in the future, it is critical to develop and invest in alternative solutions, including cloud-based options. This is the most important lesson to take away from this situation.
Furthermore, while responsibility for the accident lies with CrowdStrike, companies must also integrate new approaches to security. One of these is continuously backing up their data. In my opinion, companies that do this regularly have probably been less affected by this disruption. Some system software is usually updated at night or in the morning. If something goes wrong, the company can simply roll it out. Another suggestion for business, and we’ve been saying this over and over for decades, is that you need to implement a backup procedure, run it, and test it regularly.
I also think that companies that keep their infrastructure in the cloud have been able to absorb the impact of this outage faster than others thanks to virtualization and API-based scripts. For AWS and Microsoft Azure hosted virtual machines, instructions are typically published within hours. Plus, it doesn’t take much time to imply those instructions, compared to that for an entire fleet of bare metal servers. Therefore, more companies would likely switch to cloud-based solutions. If 20% of companies did that, it would be a fantastic win for our industry. But I think only 5-15% would actually choose that.
Future updates
Moreover, future updates are also better rolled out gradually. It means upgrading a small subset of systems first, then monitoring their performance and expanding the changes to a larger group of systems. This strategy would take companies more time to update everything, but it would help them avoid the massive damage we’ve seen today.
There are also some steps that regulators can take. Many companies create a risk model to assess potential threats and choose appropriate cyber defense solutions. However, regulators sometimes prescribe specific cybersecurity measures without checking whether all companies really need them. For example, they may require the installation of antivirus software without verifying its need for each company. As a result, some companies purchase cybersecurity solutions simply to comply with regulations, rather than based on their actual needs. It is likely that 50% to 90% of affected companies would not have been affected if they had not installed CrowdStrike or other EDR and XDR software products primarily for compliance reasons.
Overall, I hope the situation will bring more positive changes to the industry and help the transition to more secure cybersecurity practices.
We recommended the best business cloud storage.
This article was produced as part of Ny BreakingPro’s Expert Insights channel, where we profile the best and brightest minds in today’s technology industry. The views expressed here are those of the author and are not necessarily those of Ny BreakingPro or Future plc. If you are interested in contributing, you can read more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro