If you’ve been having trouble getting on Facebook and Instagram lately, you’re not alone.
Since the beginning of the year, Meta’s services have suffered 33 outages, including two of the largest since 2022.
While it’s easy to imagine Facebook being attacked by malicious hackers, the truth could be even worse for the company.
Speaking to MailOnline, tech experts revealed that Meta may have created a system that is now too complex to continue running, especially as the company continues to cut staff.
Worryingly, an expert has warned that the problems will only get worse, describing the disruptions as ‘existential’ for Meta.
Cybersecurity experts told MailOnline that Meta’s service outages are due to the company creating a system so complex that it can no longer be properly maintained
Meta’s problem, as cybersecurity expert Dr Junade Ali told MailOnline, is something called ‘technical debt’.
This essentially refers to the fact that big tech companies like Meta have built very complex pieces of the internet based on outdated systems that don’t quite work.
Dr. Ali says, “What happens is that there are ‘legacy systems’ that people don’t have time to fix.”
As Meta has grown and gobbled up services like Instagram and WhatsApp, it has had to make more things work thanks to this technical debt.
Each of these thousands of different systems and services talk to each other through something called an API or Application Programming Interface.
These left the complex systems work as a whole, but if something goes wrong in one API, the consequences can quickly spread to many different services.
This means that problems with routine updates and new features can cause cascading effects that can lead to outages big enough for users to notice.
Dr. Ali says, “When you work on a computer system like Meta, you’re always releasing new features and performing maintenance… the most important thing is to be able to recover quickly.
‘But if you can no longer keep up with that housekeeping, then things become a lot more striking.’
On March 5 and April 3 this year, Meta services, including Facebook, WhatsApp and Instagram, were down for about two hours.
The issue appeared to be widespread across Meta’s services, affecting Threads and even users of Meta Quest VR headsets.
Meta acknowledged at the time that the services were unavailable and attributed the outages to a “technical problem.”
However, further analysis may provide clarity as to what exactly this ‘technical error’ might have been.
Although these disruptions were labeled as “server outages,” Meta’s servers never went down and the site remained live the entire time.
Your browser does not support iframes.
On April 5, users were unable to log in to meta services including Facebook, Instagram, WhatsApp and Threads due to an authentication error
On April 3, another service outage resulted in 714 people reporting they could not access Facebook via Down Detector
It is also not very likely that Meta’s servers were targeted by cybercriminals, although this cannot be completely ruled out.
In the immediate aftermath of the March 5 service outage, the hacker group Anonymous appeared to claim responsibility for a cyberattack against the company.
However, Angelique Medina, head of internet intelligence at Cisco Thousand Eyes, told MailOnline that human error was a more likely cause.
A cyber attack such as a Distributed Denial of Service (DDoS) attack where a company’s systems are overwhelmed by large numbers of requests would leave a clear trail.
Ms. Medina explains, “If it’s something like a DDoS attack where you see a lot of traffic flooding a particular service, you’re going to see the ripple effects across many different ISPs (internet service providers).”
In her analysis of network traffic around Metaservices, Ms. Medina found no evidence of these ripple effects.
Hacktivist group Anonymous appeared to claim responsibility for the outage, but it is common for hackers to falsely claim attacks to spread disinformation and boost their credibility.
These diagrams show connections to Meta’s servers during the April 3 service outage. As the green colors indicate, all servers remained active, indicating that the problem was in Meta’s backend
This makes it much more likely that Meta’s developers released an update that interacted poorly with the rest of the infrastructure.
Ms Medina explains: ‘What we usually see with these types of outages is that an update may have been made to the application or the underlying infrastructure.
“Disruptions like this are, for lack of a better word, self-inflicted,” she said.
While these two outages were the most notable, they are far from the only times Meta has experienced service outages this year.
In fact, Meta’s service outages seem to be getting worse over time
There were 33 cases of ‘performance degradation’ between January 1 and April 5 – an increase of 154 percent on the same period the year before.
Because total failures are so costly to the company, they are usually resolved quickly.
If not, the results can be almost catastrophic in October 2021 when Meta’s services disappeared for between five and seven hours.
The resulting loss of advertising revenue cost the company an estimated $100 million (£80 million). and wiped five percent off the company’s stock price.
This makes the fact that Meta just saw two global outages, each lasting over two hours, all the more disturbing.
Dr. Ali says: ‘Normally you would try to express the average recovery time in minutes.
“A few hours in is quite concerning because it means something went wrong with the detection process or something went wrong with the recovery.”
If service outages for Meta’s products like Facebook and Instagram continue to worsen, it could cost the company millions in lost ad revenue and hurt its stock price.
More worryingly, some experts are not optimistic that these systems will improve in the future.
Cybersecurity expert James Bore told MailOnline that he expects these issues to become ‘existential’ for Meta.
He says: ‘They are not going to improve, it will never be better than it is now.
“It will continue to spiral out of control and continue to decay and become more and more fragile with more failures… people will lose confidence and eventually I suspect it will just go away.”
Speaking to Mr Bore, a Meta insider reportedly said: ‘They have no control internally.
“They mainly keep it going, and that’s about all they can do.”
The biggest problem, Mr. Bore claims, is that the system has grown too big and does too many things for Meta to keep everything working.
“We’re getting systems that are becoming more and more complex, with more and more bad code, which makes it harder to work with and as time goes on you have to dedicate more and more resources to them,” he says.
This also seems to be an issue that Meta has been aware of for a while.
In an internal Facebook meeting from 2019 that was later leaked The edgeMeta CEO Mark Zuckerberg said Facebook’s outages were becoming more serious.
In a leaked audio recording, Meta CEO Mark Zuckerberg (pictured) told employees back in 2019 that the complexity of the company’s system meant small problems were causing “systems to fail.”
Mr Zuckerberg told employees: ‘It’s not that there is one technical entity, except that the complexity of the systems increases.
“So things that used to be just a blip are now things that cause systems to collapse.”
Things are now getting worse, Mr Bore and Dr Ali both told MailOnline, as companies like Meta try to cut costs such as staff.
In May last year, Meta laid off more than 10,000 employees, on top of the 11,000 earlier in November 2022, each time losing around 10 percent of its total workforce.
Mr Bore says: ‘We know from experience that taking people out of a system rarely makes it more stable.
“Even if they weren’t particularly good, you just lost ten percent of your hands on the keyboards trying to keep this system running.”