>
Microsoft has released its initial conclusions about what it believes caused a major recent outage that affected some of its most popular software offerings.
The outage prevented employees across Europe and Asia from logging into Microsoft 365 services for hours, with Microsoft Teams, Outlook, OneDrive for Business, Exchange Online and SharePoint all affected.
After initially identifying “a change in wide area network (WAN) routing” as the culprit, Microsoft has now released the findings (opens in new tab) from the initial investigation into the outage, which revealed that things were, in fact, a bit complicated.
Outage Microsoft Teams explained
“Between 07:05 UTC and 12:43 UTC on January 25, 2023, customers experienced network connectivity issues manifested as long network latency and/or timeouts when attempting to connect to resources hosted in Azure regions, as well as other Microsoft services, including Microsoft 365 and Power Platform,” the company’s report said.
“We determined that a change in the Microsoft wide area network (WAN) affected connectivity between clients on the Internet to Azure, connectivity across regions, and cross-premises connectivity over ExpressRoute.”
As part of a planned change to update the IP address on a WAN router, a command given to the router caused it to send messages to all other routers on the WAN, causing them all to adjacent and forwarding tables had to be recalculated. During this recalculation process, the routers were unable to correctly forward the packets they passed. The command that caused the problem behaves differently on different network devices and the command was not vetted using our full qualification process on the router it was on.”
Microsoft said it was generally able to identify the problem within an hour and all of its internal network equipment was back to normal within two and a half hours.
To help prevent the same problem from happening again in the future, Microsoft has “blocked very powerful commands from running on the devices”. The company is also working on adding a new requirement for all command execution on its devices to follow safe change guidelines.