We stocked up, considered building bunkers, and generally prepared for the first tech apocalypse on January 1, 2000, as if it might mean the end of the world. But the original Year 2K came and went and was nothing compared to Y2K24.
That’s what many are calling the CrowdStrike outage, which caused a global tech disaster of unprecedented proportions.
The details, as we understand them, are this: Cybersecurity firm CloudStrike delivered a bad piece of code to Windows host systems around the world, leading to crashes of those Windows systems and servers and blue screenings around the world. CloudStrike has thousands of customers, many in business, enterprise, government, travel, healthcare, and more… the list goes on.
Travel was disrupted, healthcare providers were unable to help patients, banks were unavailable, stock markets were closed and shipments stopped. Everything basically went to hell on July 19, a day that will go down in history as the worst IT outage ever and our Y2K24.
I didn’t coin that term.
A bit from me on @CNN this morning talking about the #CloudStrike outage pic.twitter.com/0tckiXxxujJuly 19, 2024
I spent most of Friday on TV explaining the outage and answering questions. Most of them were about how this could happen, but TV presenters were just as concerned about how we could prevent it from happening again.
The realization is slowly sinking in that the interconnected world we thought we lived in 24 years ago is now real. We thought our globalized system, with everything running on computers that were never programmed to handle the transition to the new millennium, would be our undoing, but it turns out we were missing one key ingredient: the cloud.
In 1999, cloud computing did not exist and vast services were delivered to millions of people via the internet. These services were often updated without knowledge, preparation or consent.
Most enterprise-grade cloud services (sometimes called Software as a Service or SaaS) are getting permission and trying to prepare customers. But when you’re trying to stay ahead of ever-changing threat factors, it can be tough. Zero-day attacks mean you have to deliver that update to customers now.
CloudStrike hasn’t fully disclosed what exactly happened, or whether this potentially bad code was security-related or simply a feature update. But there’s no doubt that this is the wake-up call we needed.
Our preparation for Y2K seems almost foolish in retrospect, because almost nothing happened. But here we are, 24 hours after the biggest tech crash in history, and some systems are still struggling to recover.
The roots of the global collapse are easy to trace. CloudStrike serves Windows host systems. Windows is still, by a wide margin, the most popular desktop operating system (Statcounter has it at 72%). It’s like a global single point of failure. Windows had over 95% market share in 1998Clearly the missing piece was a dominant cloud service with open-border code delivery to all those Windows systems (that not enough companies had sandboxes for incoming code is another problem).
If we don’t take action now, such as diversifying cloud-based providers into more than one dominant service, this will happen again. In a sense, we got a warning earlier this year when AT&T went under due to another coding error. What’s worse is that we saw how the domino effects can easily spread to other seemingly separate services.
CloudStrike is a system that spans so many sectors that every time a major outage occurs, everything and everyone is at risk.
Y2K was always real; it just took 24 years to happen. I didn’t say this when I spoke to the presenters, but maybe I should have: I have no idea how we’re going to prepare for the inevitable next global tech crash.