Experts say it could take weeks to fully recover from the IT outage that wreaked havoc around the world on Friday, after airports, healthcare facilities and businesses were hit by the “largest outage in history”.
Flights and hospital appointments were cancelled, payroll systems crashed and TV channels were taken off the air after a botched software upgrade hit Microsoft’s Windows operating system.
It came from US cybersecurity firm CrowdStrike and left employees with a “blue screen of death” as their computers would no longer boot. Experts said each affected PC would have to be manually repaired.
In the UK, Whitehall crisis officials coordinated the response through the Cobra Committee. Ministers were in touch with their sectors to deal with the fallout from the IT outage, and the transport secretary, Louise Haigh, said she was working “at the same pace as the industry” after trains and flights were hit.
A Microsoft spokesperson said: “We are aware of an issue affecting Windows devices due to a third-party software platform update. We expect a resolution to be available shortly.”
CrowdStrike confirmed that the outage was the result of a software update to one of its products, and not a cyberattack. Founder and CEO George Kurtz said he was “deeply sorry for the impact we caused to customers,” adding that there had been a “negative interaction” between the update and Microsoft’s operating system.
CrowdStrike’s stock price fell dramatically throughout the day, at times dropping as much as 13% during trading.
Govia Thameslink Railway (GTR) – the parent company of Southern, Thameslink, Gatwick Express and Great Northern – warned passengers of delays. According to service status monitoring website Downdetector, users across the UK reported problems with services from Visa, BT, major supermarkets, banks, online gaming platforms and media outlets.
Channels Sky News and CBBC were also temporarily off the air in the UK, but were able to resume broadcasting. Australian channel ABC was also affected.
In financial services, Metro Bank reported problems with its phone lines in the UK and Santander said card payments “may be affected”. Monzo said some customers were reporting problems, while some bankers at JP Morgan were unable to log into their systems and the London Stock Exchange said there were problems with its news service.
Troy Hunt, a leading cybersecurity consultant, said the scale of the IT outage was unprecedented.
“I don’t think it’s too early to say: this is going to be the biggest IT outage in history,” he tweeted.
“This is basically what we were all worried about with Y2K, except this time it actually happened,” he added, referring to the Y2K bug that IT experts worried about in the run-up to 2000 — but which ultimately caused no serious damage.
According to the British Institute for IT, the BCS, it can take days to weeks to restore systems, although some solutions are easier to implement.
“In some cases, the fix can be implemented very quickly,” said Adam Leon Smith, a BCS fellow. “But if computers have reacted in a way that causes them to go into blue screens and endless loops, it can be difficult to recover from and that can take days and weeks.”
Alan Woodward, a professor of cybersecurity at the University of Surrey, said the fix required a manual reboot of affected machines and that “most standard users wouldn’t know how to follow the instructions”. Organisations with thousands of PCs spread across multiple locations would have a harder time, he added.
“It’s just the numbers. For some organizations, it could definitely take weeks,” he said.
Among the companies affected on Friday was Ryanair, Europe’s largest airline, which said on its website: “Possible network disruptions due to a global outage of an external system… We advise passengers to arrive at the airport three hours before their flight to avoid disruptions.”
Heathrow, Europe’s largest airport, said it was “working hard” to get passengers “on their way”.
A Heathrow spokesman said: “We continue to work with our airport colleagues to minimise the impact of the global IT outage on passenger travel. Flights remain operational and passengers are advised to check with their airline for the latest flight information.”
In the U.S., flights were grounded due to communications issues that appear to be related to the outage. American Airlines, Delta and United Airlines were among the carriers affected. Berlin Airport temporarily halted all flights on Friday. Aviation analytics firm Cirium said 4,295 flights — 3.9% of scheduled flights — were canceled worldwide on Friday, including 143 departures to the United Kingdom.
GP practices across the UK said they were unable to access patient records or book appointments. Practices took to social media to say they were unable to access the EMIS web system. 999 services were reportedly unaffected by the outage, but the Royal Surrey NHS Trust, in the south of England, declared a critical incident and cancelled radiotherapy appointments scheduled for Friday morning. The National Pharmacy Association confirmed that UK services could be affected.
A spokesman for Keir Starmer said they were not aware the issue was affecting government services, but added they could see it had wider implications.
Israel’s Health Ministry said “the global outage” had affected 16 hospitals, while in Germany, the University Hospital of Schleswig-Holstein in the north of the country said it had cancelled all planned surgeries in Kiel and Lubeck.
Portland, Oregon, Mayor Ted Wheeler declared a state of emergency, saying certain essential city services, including emergency communications, were affected by the outage.
Alan Woodward of the University of Surrey said the outage was caused by an IT product called CrowdStrike Falcon, which monitors the security of large networks of computers and downloads a piece of surveillance software to each machine.
“The product is used by large organizations with a large number of PCs to ensure that everything is monitored. Unfortunately, if they lose all the PCs, they can no longer operate, or only with a much lower level of service,” Woodward said.
Steven Murdoch, professor of security engineering at University College London, says many organizations may struggle to implement the solution quickly.
“The problem occurs before the computer has access to the internet, so there is no way to fix it remotely, so someone has to come in … and fix the problem,” Murdoch said, adding that companies and organizations that have cut back on IT staff or outsourced their IT work will find their ability to address the problem hampered.
However, Ciaran Martin, former director of the National Cyber Security Centre, said that unlike hostile cyber attacks, this problem had already been identified and a fix had already been announced.
“The recovery is not about getting out of the water, it’s about getting back up. I don’t think it’s going to be very newsworthy in terms of continued disruption next week,” he said.
CrowdStrike President George Kurtz tweeted that the incident was caused by a “defect found in a single content update for Windows hosts”. He added: “This is not a security incident or cyber attack. The problem has been identified, isolated and a solution has been implemented.”
The problems for US companies were compounded by problems with Microsoft’s Azure cloud computing unit that occurred on Thursday.